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RULES OF THUMB FOR DETERMINING EXPECTATIONS 
OF MEAN SQUARES IN ANALYSIS OF VARIANCE* 


E. F. Scuutrz, Jr. 


Alabama Polytechnic Institute and 
Institute of Statistics, North Carolina State College 


INTRODUCTION 


Exact procedures for determining the expected values of sample 
mean squares in terms of population parameters are adequately de- 
scribed in a number of places in statistical literature (1, 3, 7)f. For 
simple designs with few classifications the processes can be gone through 
quickly, and with practice, the expectations of such méan squares can 
be written by inspection.. However, when a design involves several 
classifications, and particularly when the classifications are a mixture 
of random and fixed variates, the processes become complex and tedious. 

The purpose of this paper is to illustrate a set of simple rules which 
reduces the processes of determining the expectations of the mean 
squares of even complex analyses to practically the equivalent of de- 
termination by inspection. These rules are sufficiently general to 
cover all complexities of classification, provided the sums or means 
at each level of summarization are composed of equal numbers of 
observations and, in the case of random variates, are drawn from infinite 
populations. 

With respect to fixed and random effects two population models 
are of common occurrence (1, 5, 6): 


(1) every variate random so that all components are random except 
the general mean (Eisenhart’s Model IT) 

(2) a mixture of random and fixed variates known oftentimes as the 
mixed model. 


Since random variates have a probability distribution but fixed 
effects do not, it is necessary to determine for each factor under in- 
vestigation whether its effects are to be regarded as fixed or random (1). 

In general, if all the treatments (or classifications) about which 
inferences are to be made are included in an experiment (or survey) the 
treatments or classifications are regarded as fixed. Since it would be 


*Contribution from the Experimental Statistics Department, North Carolina Agricultural Experi- 
ment Station, Raleig), North Carolina. Published with the approval of the Director of Research as 
Paper No. 572 of the Journal Series. 

+Numbers in psrentheses refer to references cited. 
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most unusual to make inferences about treatments or classifications 
not included in an experiment (except by transformation and inter- 
polation of quantitative classifications) it follows that the treatments 
or classifications studied in an experiment are the only treatments about 
which inferences are planned (i.e., are the complete population of 
treatments so far as a particular experiment is concerned) and therefore 
treatments are customarily regarded as fixed. 

If on the other hand it is wished to make inferences about an overall 
mean effect from a sample only of all the effects such as, perhaps, the 
average yield of inbred lines of corn from the observed performance of 
only a few lines, then the effects are regarded as random. 

The sampling or experimental design and procedures (which must 
be known for analysis) are also helpful in determining whether effects 
are to be regarded as fixed or random. 


THE RULES 
For Both Models 


RULE 1. Decide for each variate (sampling level or factor) whether it 
is to be regarded as fixed or random and assign it a letter to be used 
both as a designating symbol and as a coefficient indicating the number 
of such individuals. List the sources of variation in the analysis of 
variance, completely identifying each source by means of the selected 
symbols. 

It is helpful in naming the sources of variation and components, and 
in preventing omissions of components, if sources are listed in hierarchal 
order. Hierarchal is used in its broader sense to include hierarchy 
involving cross classified variates as occurs in the split plot design. 


RULE 2. List in the expectation of each mean square the component 
due directly to that particular source. Completely identify the com- 
ponent by using as subscripts all of the symbols necessary to completely 
identify or describe the source; in which case all of the remaining symbols 
become coefficients of the component. This procedure completely 
identifies the totality of components which must be considered. List 
as other components in the expectation of a particular mean square all 
other components whose identifying subscripts contain all of the 
symbols necessary to completely describe the source of the mean square 
under consideration. 

It is helpful if the order of the subscripts is such that the first symbols 
following o” describe the origin of the variation while the remainder 
(enclosed in parentheses) indicate the position in the hierarchy at which 
the component arises. The subscripts describing the origin of the 
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variation will, for purposes of distinction, be referred to as “essential” 
or “truly descriptive”. If the suggested procedure of ordering sub- 
scripts is followed (as it is in this paper) we may define the “‘essential’’ 
or “truly descriptive’ subscripts in a mechanical manner as those 
immediately following o” and not enclosed by parentheses. 


For the Mixed Model 


If there are fixed effects (either one or more) then Rules 1 and 2 
still hold by virtue of adding Rule 3 specifying certain deletions from 
expectations obtained by Rules 1 and 2. 


RULE 8. To determine which components should be deleted consider 
each component in the following manner. Among the “essential’’ or 
“truly descriptive” subscripts of the component under consideration 
ignore or delete from consideration those one or more subscript symbols 
which are necessary to describe the source of variation in which the 
component is listed. Jf any of the remaining “essential” subscripts 
specifies a fixed effect, delete the component from the expectation. 

The necessity for Rule 3 arises from the fact that in the case of a 
fixed effect the total population has been included and there is no 
component of uncertainty in the estimate due to having sampled the 
population. If the method of sampling leads to cross classification of a 
fixed effect with a random variate then the resulting interaction gives 
rise to a component which is “random in one direction only’’, i.e., such 
a component does exist as a part of the expectation of the mean square 
of the fixed effect (since measured over the random variate) but does 
not exist as a part of the expectation of the random variate (since 
measured over the fixed effect) (1). 

For purposes of distinction a component due directly to a fixed 
effect is denoted by 6”. 


EXAMPLES 


An Example with Simple Sampling and Subsampling, All Variates 
Random 


Suppose, in order to estimate the firmness of peaches in a certain 
location during a particular season, one may have made duplicate 
determinations of the firmness of peaches chosen in the following manner: 
a definite number of peaches chosen at random from each tree of a 
sample of trees in the location. 

Following Rule 1 we list the sources of variation as in the first column 
of Table 1. It is convenient to designate trees by ¢ which, when used 
as a coefficient, also designates the number of trees. Since the trees 
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are only a random sample of the trees producing the peaches whose 
firmness we wish to estimate, we may correctly decide that trees are 
random. 


TABLE 1 
Structural Analysis and E(M.S.) for a Sampling Scheme Investigating Fruit Firm- 
ness by Means of d Duplicate Determinations on Each of f Fruit from Each of ¢ 
Trees, all Components Random Except the Mean. 


Source of Variation d.f. E(M.S.) 

Total dft — 1 

Trees (T) : t-1 do? + dfa? 
Fruits (F) in T (f + dor) 

Detns. (D) in F in T (d — lft ou)(t) 


Fruit may be designated by yf which, when used as a coefficient, 
also designates the number of fruit per tree. Since the individual fruit 
were chosen by random means, they are properly regarded as random 
samples of the fruit on the trees from which they were harvested. 

The duplicate determinations made on each fruit are designated by 


d which, when used as a coefficient, also designates the number of. 


determinations per fruit. Duplicates can hardly be regarded otherwise 
than as representing random effects. 

We see now that the model with all components random except the 
general mean is appropriate. 

Following Rule 2 we list for each source of variation a component 
due directly to that source. For each mean square this is the component 
listed last. For the last listed source of variation, that of the ultimate 
units of the experiment, we find the component to be oi:,),.) Which is 
the expected mean square of that source, Determinations in Fruit in 
Trees. It sometimes happens that the basic unit of variation represents 
two or more components, but if so, they are confounded and are treated 
as a single component. 

Advancing to Fruit in Trees it is easily verified that the subscripts 
in o%7)¢2) contain f and ¢, the symbols necessary to fully describe the 
source, Fruit in Trees, hence o%:,),,) is a part of the expectation of the 
mean square of this particular source. There is also the component 


due directly to the source, in this case o;,,, . Since this component 
requires only f and ¢ for designation, the remaining symbols, only d in 
this case, appear as coefficients giving do;,,) . The expectation of 


MS. priar is + as shown in Table 
Advancing now to consideration of the expectation of M.8.7 we 
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‘find that o3,,)<) contains ¢, so that o3,,);.) is part of the expectation of 
Trees. Also o7,,) contains ¢ so that the component due directly to the 
Fruits in Trees (¢;,,) with coefficient d) is also a part of the expectation 
of Trees. There is also a component due directly to Trees, o¢ , with the 
remaining symbols as coefficients yielding dfot . The expectation of 
is then + + as shown in Table 1. 


An Example with Both Cross Classification and Sampling, All Variates 
Random 


Suppose now, that in order to take account of the day to day vari- 
ability which may exist, we repeat the sampling procedure on the same 
trees on each of several days not chosen for any characteristic. 

Following Rule 1 we assign q to indicate days when used as a sub- 
script and to indicate the number of days when used as a coefficient. 
The days are to be regarded as having random effects since they were 
not chosen to represent any special characteristic of days and no infer- 
ences about the effects of various kinds of days are contemplated. 

We may observe that again we have the model with all components 
random except the general mean. At some levels we have again used 
simple random sampling (fruits and duplicate determinations). As 
regards days and trees however, while each was selected in a random 
fashion, observations were repeated on the same trees on the different 
days. This leads to cross classification of the observations and one of 
the sources of variation will now be the result of interaction or dis- 
crepance. 

The sources of variation in this experiment are shown in the first 
column of Table 2. 


TABLE 2 
Structural Analysis and E(M.S.) for a Sarapling Scheme Investigating Fruit Firm- 
ness by Means of d Duplicate Determinations on Each of f Fruits from Each of ¢ 
Trees, the Whole Repeated on the Same Trees on g Days, All Components Random 
Except the Mean. 


Source of Variation d.f. E(M.S.) 
Total dfqgt —1 
Trees (7') + + dfog, + 
Days (Q) + do¥ (a1) + 
QxT @ — 1) + doz cr) + 
Fruits (F) inQ X T (f — 1)at + 
Detns. (D) in F in Q x (d = 1)fqt at) 
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Listing for each source of variation the component due to that 
source we find opposite M.S.p;,riagxr , the source of unit variance, its 
expectation 

Considering it is plain that the subscripts of e+) 
contain f, g, and t, the symbols necessary to identify the source under 
consideration so the component ¢4;;),,:) is a component of the expected 
value of M.S.rinex7 . This component together with the component 
due directly to the source, o;,,,) with coefficient d, comprise the ex- 
pectation of M.S.pinexr - 

The procedure is followed until we find the expectation of M.S. 
to be 


dot cat) + + dfqai : 


A More Complex Example with Both Cross Classification and Sampling, 
All Variates Random 


Actually such an experiment as described in the previous example 
might be repeated at a number of locations in order to obtain an estimate 
for the region rather than a particular location (Table 3). It might 


TABLE 3 
Structural Analysis and E(M.S.) for a Sampling Scheme Investigating Fruit Firmness 
by Means of d Duplicate Determinations on Each of f Random Fruit from Each of 
t Random Trees in Each of 1 Random Locations, the Whole System Repeated on the 
Same Trees on Each of g Random Days. 


E(M.S.) 

Total dfgtl — 1 
Locations (L) = x « 
Trees (7) in L (¢ —1)l x 
QxL (q — 1)(l — 1) 
QxTinLl (q — 1) — 
Fruits (F) inQ X Tin L (f — 
Detns. (D) in FinQ X Tin L (d — 1)fqil x 
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also be that, though the days were randomly chosen, the work was so 
coordinated that the observations were made on the same days at the 
different locations. 

Following Rule 1 we assign the symbol / to locations and decide, 
since the locations were chosen only to represent the region, that 
locations are to be regarded as a random variate. 

Further application of the rules leads to the expectations in Table 
3. Instead of writing out each component with its necessary list of 
coefficients and subscripts each time it occurs in Table 3, there is pro- 
vided for each component a column which is merely checked if the 
component is a part of the expectation of a mean square under con- 
sideration. This example demonstrates that, even with a complex 
experiment, application of the proposed rules leads to the correct 
expectations. It will be used later to illustrate Rule 3. 


An Example of Cross Classification, Fixed Effects with One Random 
Variate 


It is entirely possible that one’s primary aim in investigating peaches 
could have'been to determine whether different pruning methods applied 
to peach trees affect the firmness of the fruit differently. In this case 
one might have selected several blocks of trees, which because of their 
appearance and contiguity were judged to be similar trees, and have 
allotted the pruning treatments one per tree to the several trees of a 
block, repeating the procedure in each block. The plan of selecting f 
fruit from each tree and making d determinations on each fruit might 
well have been continued. Suppose we have data at hand collected 
by such a procedure and that there are results for one day only. 

Following Rule 1 we would conclude that determinations and fruit 
are stillrandom. Trees also are still random but they have been replaced 
by blocks of trees, or replications, which give observations that are cross 
classifiable with respect to prunings. The pruning, however, is entirely 
at the disposal of the experimenter. He will choose to prune in certain 
fashions, and he will draw inferences about the effects of pruning in 
these certain fashions, but in no other. For purposes of consideration, 
then, the entire population of pruning methods is represented in the 
experiment. As a consequence there is no variability due to sampling 
the population of pruning methods and we consider the effects of prunings 
to be fixed (or constant). 

We have then p fixed prunings on single trees in each of r random 
replications, with f random fruit per tree, and d random duplicate 
determinations per fruit. 

Application of Rules 1 and 2 leads to the components listed in 
Table 4. 
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TABLE 4 
Structural Analysis and E(M.S.) for a Sampling Scheme Investigating Fruit Firmness 
of p Fixed Prunings Imposed on Single Trees in Each of r Random Replications with 
f Random Fruit per Tree and d Random Determinations per Fruit. 


Source of Variation d.f. E(M.S.)* 
Total dfrp — 1 2 
Replications (R) + + + dfpo; 
Prunings (P) + + + 
PXR (p — — 1) | + + 
Fruits (F) in P X R (f — 1)pr + 
Detns. (D) in FinP X R (d — 1)fpr (pr) 


*Underscored components do not exist under the conditions assumed. 


Applying Rule 3 to component dfo%, in the expectation of the mean 
square for replications, H(M.S.,), we find that we are required to ignore 
or delete or cancel from consideration, “essential’’ or “truly descriptive” 
subscript r (immediately following o” and not enclosed in parentheses) 
because the symbol r is required in the description of the source. This 
leaves only subscript p. Since p, a remaining ‘essential’ subscript, 
represents a fixed effect the component is deleted from the expectation. 
The deletion is indicated in Table 4 by underscoring dfo;, so that 
is (pr) + dozer) + dfpo? . This is the only component 
deleted from Table 4 by application of Rule 3. 


A Complex Example of Cross Classification, Two Sets of Fixed Effects 
which Cross Classify with Two Random Variates which Cross Classify 


In actuality the investigator might simultaneously investigate the 
effect of pruning on firmness of both ripe and green peaches and, as in 
our second example, he might also investigate whether there were day 
to day Variations in the effects. : 

There would then be pm combinations of p fixed prunings with m 
fixed maturities investigated on single trees in r random replications 


. Tepeated on the same trees on each of g random days with f fruit being 


taken at random from each tree each day with d duplicate determi- 
nations of firmness being made on each fruit. 

We proceed again by Rules 1 and 2 laid down for the case of all 
variates random with the idea that we will later use Rule 3 to strike out 
such components as do not exist because of the different behavior 
of components when the model includes fixed effects. We have then 
Table 5. 


| 


131 


EXPECTATIONS OF M.S. 


ur ut (q) 

x x 

ur symigq 

x |x x | 

248 & — 5)(1 — — w) OXdXW 

x x |x x (I — 4)(I — — w) axdxwW 

x |x x (I — — — w) uxoxw 

x x x (1 — — Ox Ww 

x x x 2 - axw 

x x (I — — — 4) axoxd 

x x x (I — — @) 

x x x x x (I — 4)(I — 

x x x (I — — 5) axod 

x x x x x x 2 |x x I-4 (gy) 

— -bdwfp TROL 

«(SWOT 


Jo wopuvy f jo uo p jo suvayy Aq 
sk{eq wopuvy jo uo oureg oy} uo pozyvodoy Jo ur uo pesoduy 
surunig d jo uo poxty w jo oursyog Zuydureg 103 pus sIsA[euy 

¢ AIAVL 


| 
| eo 
; 
| 

| 
| 
| | 
: 

| 


132 BIOMETRICS, JUNE 1955 


For a specific example of the operation of Rule 3 consider in Table 
5 the expectation of Prunings mean square, /(M.S.p). Starting with 
components due to smaller units in the first 2 columns we note that 
the “essential” subscripts of mpqr) include only sub- 
scripts representing random variates so that the conclusion regarding 
presence or absence of these components will not be affected by the 
application of Rule 3. 

In -the third column we find a a due to interaction 
Afonprer With 4 “essential” subscripts. Deleting p the symbol necessary 
to describe Prunings we have remaining m, gq, and r. Since m, one of 
the remaining ‘essential’ subscripts, represents a fixed effect this 
component, which would exist as a part of the expectation of Prunings 
if all variates were random, is not a part of the expectation under the 
assumption that maturities are fixed. In the next column we find the 
component dfro.,, , Whose “essential” subscripts contain m and q 
after deleting p. Since m represents a fixed effect this component does 
not exist in the expectation of Prunings. The presence of m in the 
“essential”? subscripts of component and component dfgr6,, 
also precludes these components being a part of E(M.S.»). The next 
three components to be considered are dfmo;,, , dfmro;, , and dfmqga;, . 
In each case, after deleting p, the subscript necessary to describe 
Prunings, the remaining “essential” subscripts represent only random 
variates, gr, g, and r respectively, so that these components are a part 
of E(M.S.,). It should hardly be necessary to remark that dfmgr6, 
is necessarily a part of E(M.S.>). 


A MORE DIRECT PROCEDURE APPLICABLE TO ISOLATED MEAN SQUARES 


Now that the rules of thumb have been enumerated and illustrated 
it may be meaningful to state the composition of an expected mean 
square more directly. 

The expectation of any mean square contains, in addition to a 
component due directly to the source under consideration, all those 
components whose subscript symbols include the set of symbols neces- 
sary to completely describe the source, provided there are only random 
variates represented in the “essential’’ subscripts after cancelling those 
symbols necessary to describe the source of variation under con- 
sideration. 


Examples 


In the case illustrated in Table 4 the expected mean square for 
Prunings contains, in addition to the component due directly to Prun- 


| i 
| 
f 


EXPECTATIONS OF M.S. | 133 


ings, two components due to the two random sampling variates, Fruit 
and Determinations, and a single component representing interaction 
or discrepance resulting from the cross classification of Prunings with a 
single random variate, Replications, thus: 


E(M.S.p) = + + dfo;, + dfré; 


In the case illustrated in Table 5 the expectation of Prunings mean 
square contains, in addition to the component due directly to Prunings, 
the two components due to the two sampling variates, Fruit and 
Determinations, plus three components representing interactions of 
Prunings with the three forms of variability, Replications (R), Days 
(Q), and Q X R, resulting from the cross classification of the two random 
variates Replications and Days, thus: 


E(MS.») (mper) + do} mpar) + df mopar 
+ dfmqo;, + dfmqr6 . 


Should it have been the case that maturities were also regarded as - 
random, then there would have been three random variates expressed 
in seven different forms (R, Q, QR, M, MR, MQ, and MQR) so that 
E(M.S.p) would include, in addition to the component due directly 
to Prunings and the two components due to the sampling variates, 
seven components resulting from interaction or discrepance. 


E(MS.») (mper) + + Af ompar 
+ + Af + df qrons 
+ dfmo>,, + dfmro,, + dfmqo;, + dfmar6, . 


That it is necessary to define the “essential” or “truly descriptive” 
subscripts, as opposed to those which merely denote the position in the 
hierarchy at which a component arises, may be shown by considering 
again the case illustrated in Table 3 but assuming now that Locations 
represent fixed effects. 

When Rule 3 is properly applied under this assumption, the only 
deletion is component dftc:, from the expectation of Days, E(M.S..). 
But should one forget to distinguish between the “essential’”’ subscripts 
and subscripts in general, remembering only that Locations represent 
fixed effects, then, considering the source Days, and ignoring or can- 
celling the subscript q necessary to describe the source, one would 
find / remaining in each component of Days excepting o; , thus indicating 


that all random components should be deleted. This is obviously 
incorrect. 
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In Table 3 it is also interesting to observe the deletions due to 
regarding Days as fixed. In this case the component dfo%.,1) is deleted 
from the expectation of Trees (7) in L and the two components dfo%..1) 
and dfto;, are deleted from the expectation of Locations. 


SPECIAL SITUATIONS 
The Basic Unit of Variation is the Result of Interaction or Discrepance 


A special case that is frequently met is an experiment conducted 
as that illustrated in Table 5 except that the firmness determination is 
made by one determination only on one fruit only from each tree on 
each date. In this case the basic component would be described as 
Omper » ® Component due to interaction. It must be recognized however 
that this estimate of o.,.- is confounded with components due to 
sampling variates such as fruit and determinations, and perhaps even 
others. Since it is unknown in this case whether o;,,,., is large or small 
relative to the other components with which it is confounded the 
manner of treating o>,,- , the basic unit of variation, is uncertain. It 
would seem wise, in most cases at least, to treat this basic unit of 
variation as a component due to a single random sampling variate rather 
than an interaction, in which case it would be unaffected by Rule 3 
concerning deletions. 


The Factorial with a Single Error Term 


If one is considering a factorial experiment of the type having p 
fixed prunings with f fixed fertilizers, the pf treatment combinations 
having been allotted at random to single trees in each of r replications, 
then the structural analysis usually is of the form following with the 
idea that “Pruning-Fertilizer Combinations” will be broken into an 
orthogonal set of comparisons for testing against a single error term. 


Source d.f. 

Total rpf —1 
Replications (R) 
Pruning—Fertilizer Combinations (C) pf — 1 

Error (pf — 1)(r — 1) 


To consider in this case that both Prunings and Fertilizers are 
separate fixed effects and to blindly isolate the interaction of each of 
these (and their joint effect) with replications according to the foregoing 
rules will lead to a separate error term with different expectation for 
each effect considered. To reconcile this circumstance with the originally 
proposed structural analysis, one has only to remember that one of 
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the basic assumptions of this type of analysis is that the errors are 
homogeneous and that, therefore, such components as o;, for Pruning 
X Replication, o;, for Fertilizer < Replication, and o3,, for Pruning X 
Fertilizer < Replication are really estimates of the same component 
and therefore the three mean squares should be pooled as, say, o2, for 
Pruning-Fertilizer Combinations X Replication. 

Another matter exists which should be called to the reader’s atten- 
tion. When treatments are tried over two or more random variates 
which cross classify, none of the existing mean squares of the analysis 
of variance has the correct expectation to serve as error for testing the 
significances of differences among treatments. This situation exists 
in Tables 3 and 5. Error terms of the correct expectation may be 
constructed (1, 2, 8, 9). 
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VARIANCE COMPONENTS WITH REFERENCE TO GENETIC 
POPULATION PARAMETERS* 


Dorotuy C. Lowry 


University of California 


The nature of variability in a population observed at a given time 
with respect to a particular metric trait is of great interest and import- 
ance among animal and plant breeders. Their work depends heavily 
upon the ability to design breeding experiments and to take advantage 
of statistical techniques which will enable them successfully to appor- 
tion differences in such a trait to the various broad causal factors 
operating upon the individuals constituting the population. This 
they must accomplish with sufficient accuracy to describe to some 
extent the genetic and environmental complex affecting the trait and 
to. predict breeding results. It has been shown, particularly by the 
work of Fisher, Haldane and Wright, that for various quantitative 
traits the system of genes involved does have average properties which 
afe measurable and the analysis of variance has proved to be a powerful 
tool in the estimation of such parameters. This paper is presented as 
a review of some of the applications of variance components in statis- 
tical genetics and of some statistical problems commonly encountered 
in their use in this field. 

The situation frequently to be met in quantitative genetics is as 
follows: we have a set of data arranged in a particular type of classi- 
fication and described by a linear function of effects of various classes 
and subclasses. Generally this model is that which Eisenhart (1947) 
has called Model II, in which all elements except u are regarded as 
random variables, although it may frequently be what he called the 
Mixed Model, in which certain of the effects are regarded as fixed 
rather than as random variables. The first step then is the estimation 
of the variances of these random variables and the second step the 
linear combination of certain of these estimates to provide further 
estimates of the parameters of heredity, by which I mean any of the 
parameters, genetic and environmental, describing the variability 
of the quantitative trait. . 

Weinberg (1910) showed that the correlation between parent and 
offspring is 1/2 o¢/o7 in a random breeding population, where a¢ is 
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the genetic variance and a7 is the total variance, if it can be assumed 
that the genetic component is due entirely to autosomal factors with 
effects which are additive. Fisher (1918) examined the correlations 
between relatives with respect to a metric trait to be expected under 
the Mendelian hypothesis, that is, under the assumption that such 
traits are determined by a large number of segregating genes dis- 
tributed among the chromosomes. He considered both random and 
assortative types of mating and demonstrated the effects upon these 
correlations of non-additive gene action of two types: 


1) dominance deviations, which pertain to single pairs of allelic 
genes. When such deviations exist the values of the three 
genotypes AA, Aa and aa, each averaged over the array of 
environments to which the population is subjected, are a, d and 
— a, respectively, where d may have any value from — a toa 
and even values outside this range. In the case of no dominance, 
the three genotypes would be represented by, say, b + c, b and 
bh — c, respectively; that is the heterozygote would be midway 
between the two homozygotes in value. 

2) epistatic deviations which arise from interactions between 
non-allelic pairs of genes. 


Thus Fisher divided the genetic variance of a breeding population into 
the additively genetic variance, the variance due to dominance devia- 
tions from the additive scheme and the epistatic variance and showed 
the decrease to be expected in the correlations between individuals of 
various relationships due to the operation of dominance and epistasis. 
The extensive work of Wright (1917, 1918, 1920, 1921, 1935) on the 
correlations between any relatives as well as extensions and applications 
by a number of people working in the field of quantitative genetics 
(Mather, Lush, Lerner, et al.) enable us to partition the phenotypic 
variance of a population into an additively genetic portion and an 
environmental proportion under a number of assumptions of which the 
most important are: 


1) Gene differences have strictly additive average effects over the 
array of environments of the population. 

2) No correlation exists between the average value of a anenres 
and its environmental variance. 

3) Hereditary and environmental factors are not eer in 
occurrence. 

4) Random mating obtains, or a slits plan in which the non- 
randomness can be astsaimnil quantitatively. 
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The breeding experiment can usually be designed so that the third 
postulate is valid though with livestock this cannot always be con- 
trolled. The first two postulates seem to be warranted as approxi- 
mations with respect to many metrical characters governed by many 
genes each having a relatively small effect; even completely dominant 
gene differences along with differences showing two factor types of 
epistatic effects can usually be almost entirely accounted for in terms 
of additive gene action (Wright, 1935; Lush, 1945). For random 
mating the correlation between full sibs is 


log lop 


where ¢;; is the total genetic variance; o¢ the additively genetic variance; 


‘op the dominance variance; o; the epistatic variance and r;,7, the 


correlation between the epistatic deviations of two siblings (Wright, 
1935). Several interesting tables appear in Wright’s 1952 paper showing 
the effects of dominance for varying gene frequencies on variances and 
correlations as well as one showing an analysis of the variability of 
two-factor F,’s* in which the 9:3:3:1 ratio is modified in different ways. 
An experiment having to do with models involving dominance will be 
discussed somewhat later. 

Now under the assumptions stated above, portions of the genetic 
variance are contained in o; and o,, , the variances arising from differ- 
ences in dams and sires, respectively, obtained in the analysis of variance. 
In addition because of segregation a further portion of the genetic 
variance is contained in «5 , the component of variance for individuals 
within full sib families. 

The environmental variance may consist of random effects entirely 
so that it is all contained in 0} or it may contain, in addition, differences 
between litters within the same full sib family in which case we will 
have a corresponding litter contribution, o; , and finally it may contain 
differences between paternal half sibs due to differences in mothering 
ability of dam, age of dam, etc.—so-called maternal effects, which 
will be contained in a; . 

If sex linkage, a particular kind of non-allelic interaction, is operat- 
ing we may have a reduction either in c,, or in the genetic portion of 
o; , depending on which sex is heterogametic and also on the relative 
effects of gene substitutions in X chromosomes of the two sexes. To 
take a simple example, if we are analyzing a trait expressed only in 
females for a population in which the female is the heterogametic sex 


*Offspring of matings of individuals heterozygous with respect to each of two pairs of genes. 
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and there are no maternal effects, c; will be less than o; . However, 
since metric traits are controlled by the relatively small effects of many 
genes the effect of any sex linkage is likely to be very small in most 
cases and generally obscured by sampling errors in o; and o,. . 

Let us consider a population consisting of the mnk progeny of m 
sires each mated at random to n dams. We can now analyze and 
interpret the variances as follows, for a trait about which we can make 
the four assumptions previously stated: 


Mean Squares Expected values Interpretation 
MS,, + ko; + = 
2 2 2 G G b 
MS, do + ko; = (ry. 
2 2 G 2 2 
MS, Fo = (1 + oz 
Since: 
o2 2 
1 
fs ee Prs or 
a; 
OT OT OT o 


where p,, and p,, are the phenotypic intraclass correlations for full 
sibs and half sibs, respectively, r2 and r,{fare their genetic correlations 
and 


2 2 2 2 2 2 
On tor +o =or=ogtoz. 


If we can assume random mating rf = 1/4 and rj; = 1/2; if our 
population were an inbred line the values would be different. If the 
values of o2 and o; were significantly different and we were dealing 
with a trait for which we suspected environmental differences between 
dam means that is, maternal effects, we would have o7 = (rf — ri) 
og + oy. I have kept this example simple so that the relationships 
would be clear. Numerous papers have been written covering much 
more complex analyses, some of which are included among the references. 

Success in mass selection for improvement of a trait and probably 
to a large extent for family selection as well, in the absence of intense 
inbreeding, depends upon the ratio of the additively genetic variance 
to the total variance, the heritability, usually denoted by h’. Thus we 
must have some idea of the importance of non-additive effects. In 
practice this probably cannot be obtained from the above type of 
analysis of variance except that in theory advantage could be taken 
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of the expectation that correlation between sufficiently distant relatives 
would be due entirely to additive variance whereas nonadditive genetic 
variance would contribute to the correlation between close relatives. 
However, we can resort to the comparison of the parent-offspring 
phenotypic correlation with the full sib phenotypic correlation, for the 
former will equal 1/2 the ratio of the additively genetic to the total 
variance while the latter includes, in addition, 1/4 the ratio of the 
dominance variance to the total variance. Intense selection leads to 
assortative mating in which case the regression of offspring on mid- 
parent should be used in place of the correlation; however with domi- 
nance present in any degree assortative mating introduces a correlation 
between the dominance deviations of parents and of offspring and 
between dominance deviations of either and additive deviations of the 
other so that the accurate estimation of the degree of heritability becomes 
practically impossible. 

The detection of non-additive gene action from the relations of 
components of variance for just one generation of an actual population 
has, as far as I know, been attacked only by Comstock and Robinson 
in a series of related papers on the estimation of the average degree of 
dominance for a multigenic trait. (1948, 1949; 1952). They have 
done extensive work on the appropriate design for estimating a measure 
of dominance “a” which they define as 


(Aa — AA) + (Aa — aa) 
(AA — aa) 


while AA — aa = u. For two designs the experimental material con- 
sists of progeny from random matings among plants of the Ff, generation 
of a cross of two nearly isogenic lines; in Experiment 1 each of sm male 
parents is mated with n different female parents while in Experiment 
2 all of the mn possible matings of m males and n females in each of s 
sets are made. In both cases there are s sets of progeny from smn 
matings in a randomized block arrangement with plot replications. 
The experimental material in the third design consists of s sets of n 
pairs of progenies, the members of each pair having the same F,, male 
parent but different female parents from each of the two inbred lines 
which produced F, . The assumptions made in deriving the genetic 
interpretations of variance components they state to be fulfilled with 
two exceptions: (1) no epistasis and (2) no linkage among genes affect- 
ing the trait or, if linkages exist, the distribution of genotypes is at 
equilibrium with respect to coupling and repulsion phases. They 
point out that the failure of these assumptions to be valid causes an 
upward bias in their estimates of ‘a’. 
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They defined the additive genetic variance for the 7th locus as that 
portion of the variance of genetic effects explained by the regression 
of the genetic effect, y, on the number, x, of A (or a) genes in the geno- 
type, and the dominance variance as the variation of deviations from 
that regression. With random matings and a frequency of .5 for A 
for all loci at which there was segregation they derived for n genes in 
Experiment 1 the following expressions: 


2 2 2 3 
Og = $2u; op = 


=2 


a is thus the mean of the a’’s for all loci, weighted relative to the 
u”s for the corresponding loci. The variances arising from differences 
in males, o;. , is shown to be equal to o¢/4 and that arising from differ- 
ences in females, o; , equal to (¢4 + o5)/4. o, and o; , components for 
the mean square expectations, contain only genetic variance under 
these experimental conditions and for a trait having no maternal 
effects; hence 


1/2 


is an estimate of d. 4@ for experiments 2 and 3 is 


and 
26m 


respectively, where o,, is the progeny variance due to interaction of 
male and female parents, o; and o,, are as defined above and o.,; is the 
progeny variance due to interaction of F, and inbred parents. An 
estimate of o¢ , the additive genetic variance, is also obtained from the 
data of these experiments. , 

The authors point out that 4 will be somewhat larger than 4, since 
at least some a’s are unequal, and suggest that the bias might be large 
if some a’s were positive and others negative, since it is the average 
absolute magnitude of a that is being estimated. If @ is significantly 
greater than 1, at least one of the a’s must be greater than 1 so that 
overdominance at one or more loci is indicated. If the assumptions of 
equilibrium with respect to segregation of linked genes and no inter- 
allelic interactions do not hold, @ may be significantly greater than 1 
even when there is no overdominance. 

Experiment 3 does not depend upon having gene frequencies of .5 
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and in this case @’ is 


— qiaru; 


Thus the weighting of the a’s depends to some extent on shifts in gene 
frequency which may be variable by loci. The upward bias due to 
linkage will again be present; however, if the bias declines rapidly as 
opportunity is provided for recombination, Experiment 3 offers a 
means of measuring that decline since the probability of an estimate 
significantly greater than 1 is a function of the expected value of the 
estimate rather than of d@ when the two are not equal. It is suggested 
that the apparent overdominance possible in these estimates of d has 
much the same significance for short-run breeding practice as true 
overdominance. 

An exact F test for Experiment 3 and approximate F tests for Ex- 
periments 1 and 2 are presented: for example, in testing for over- 
dominance we test whether @ is significantly greater than 1; if we want 
to establish the conclusion that the various loci exhibit no dominance 
or only partial dominance, we test whether 4 is significantly less than 1. 
The F tests are essentially tests of whether one mean square differs 
significantly from an estimate of this mean square based on a linear 
combination of other mean squares. The estimate used is such that 
its expected value is equal to that of the mean square it is estimating 
when 4 is equal to 1. 

It is shown that, as would be expected, Experiment 2, when the 
experimental material permits its use, is better than Experiment 1 
since the estimate ¢> depends on mean squares with fewer degrees of 
freedom in the first experiment. Experiment 3 is shown to be the most 
powerful, the plot requirement being 1/12 to 1/10 as great as for Ex- 
periment 1 and 1/4 to 1/2 as great for Experiment 2. 


Statistical Techniques 


Most of the published papers on estimating variance components 
are concerned with the one-way classification, nested classifications 
and with factorial classifications having equal sub-class numbers. Data 
from breeding experiments often, in fact usually, involve unequal 
numbers of classes and class numbers. This causes no real trouble 
when we are dealing with nested classifications until we reach the point 
of estimating errors but does create difficulties in factorial experiments. 
Furthermore, we are frequently dealing with the Mixed Model in 
which some of the effects are assumed to be fixed rather than random 
variables. Biometrics 1953 presented a paper by Henderson which 
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has satisfied a real need. Here Henderson discusses three methods for 
estimating variance components under the above mentioned handicaps 
and illustrates their application with some genetic data. I shall outline 
the methods and give his conclusions concerning them. 

Method 1 consists in computing sums of squares as in the correspond- 
ing orthogonal case, equating these sums of squares to their expectations, 
derived under the assumptions of the Eisenhart Model II, and solving 
for the unknown components. Formulas for the computation of the 
coefficients of the various components of variance in the expected 
values of the different sums of squares are given. Method 1 is the 
simplest but gives biased estimates of some of the components when we 
are dealing with the Mixed Model while assuming Model II. Another 
bias is present if some of the elements of the Model are correlated. 


Method 2 is again not difficult. The model Henderson has taken is as 
follows: 


= th; +8; + (h8)i; + 


where the a,’s, h = 1, 2, --+ p, are fixed effects. Least squares estimates 
of the a’s and of the d;;’s [d;; = u + h; + 8s; + (hs);;] are estimated 
jointly. The least square equations reduce to p in number and, with 
the imposition of one restriction on the a’s, reduce again to p — 1 in 
p — 1 unknowns. Solutions for the 4, are obtained as well as the 
inverse of the p — 1 rowed matrix used in the solutions. This inverse, 
in turn, is used in combination with different matrices formed from the 
various class or subclass numbers to estimate the corrected coefficients 
for the variance components corresponding to the corrected sums of 
squares. The latter are obtained from new class totals, corrected for 
the a’s. The Zyii;. is corrected by the reduction R(a, , d;;) and all 
components are then estimated. Method 2 gives estimates which are 
free from the bias resulting from using Method 1 when some of the 
effects are fixed but is still biased when some of the effects are correlated. 

Method 3, which is unbiased but formidable, consists in computing’ 
the mean squares by a conventional least squares analysis (method of 
fitting constants, for example) of nonorthogonal data, equating these 
mean squares to their expectations and solving for the unknown 
variances. As in Method 2, the inversions of certain matrices are re- 
quired in order to obtain the coefficients of the variance components in 
the expectations. The relative sizes of the sampling variances of esti- 
mates obtained by the three methods are not known. 

An excellent report of the progress which has been made in the 
estimation of variance components and of sampling variances of these 
estimates has been presented by Crump (1951) who also indicates 
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situations for which the sampling variances are not known. All too 
frequently people working in animal and plant breeding find them- 
selves facing just such a situation. Since estimates of genetic para- 
meters, for example, genetic variance, dominance variance, heritability, 
etc., are functions of one or more mean squares from the analysis of 
variance, estimation of the sampling variances of such estimates is 
difficult at best. 

Crump defines a balanced classification as one in which all of the 
classes or subclasses of any chosen rank contain the same number of 
observations. If we consider first a balanced multiple classification for 
Model II with degrees of freedom which are not very large and we are 
interested in estimating the sampling variance of an estimate of a 
genetic parameter, 


= a,M, + a.M, + 


where M;, is a mean square with degrees of freedom r; , several methods 
of attack are open to us. Satterthwaite (1941, 1946) has examined the 
distribution of ¢ and has recommended that it be approximated by a 
x’ distribution with effective degrees of freedom, r, determined by 
the relation 


[a,M, + a,.M, + 
(a,M,) 4. 


T2 


He suggests that the approximation should be used cautiously when 
one or more of the a’s is negative since the approximating distribution 
does not allow negative values of ¢*. An approximation of this type 
was suggested by H. F. Smith (1936) for a problem involving only two 
mean squares with a, = a, = 1. Bross (1950) has constructed an 
approxin:ate fiducial interval for a variance component, o; , arising 
from the class differences in a one-way classification based upon Fisher’s 
solution (1935). The limits are functions of é¢; , F obtained from the 
data and tabular values of F, . Bross also gives approximate confidence 
intervals for o; by using the fact that when o;+0 (M,/M,).is dis- 
tributed as F[E(M,)]/[E(M,)] so that, if E(M,,) = o5 and E(M,) = o5 + 
no, , (F/F. — 1) o/n is an exact lower confidence limit for o; and 
(F/F. — 1) (é/F — 1) is a rough lower confidence limit. Both Satter- 
thwaite and Bross investigated to some extent the accuracy of their 
approximations but, as Crump points out, more investigation is needed. 
Cochran (1951) presents an approximate F test, 


= 
+ + + M, 
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for a linear relation among variances, 0, + 06. + --- + 6, = 0,4, + 
0,.2 + +++ + 6 using as one of his illustrative problems that of Com- 
stock and Robinson discussed previously in which the expectations of 
three mean squares were connected by a linear relation, the coefficients 
being functions of the quantity @ which was used as a measure of the 
degree of dominance. Hence the null hypothesis that @ is not different 
from a specified value leads to known values for the coefficients and the 
linear relation can be tested from the data. The effective degrees of 
freedom, n{ and nj for Cochran’s F’ test are found by the rule suggested 
by Smith and Satterthwaite. Of the possible F’ ratios which could be 
formed, Cochran suggests using one for which the coefficients of the 
mean squares are positive. He investigates and recommends the F’ 
test in the case of three variances only, 6; = 0, + 6, , affirming that the 
approximation would be less satisfactory with four variances since two 
nuisance parameters of the type @;/0; would be involved. 

If we take up the unbalanced case, which will be the one most 
likely to be encountered in animal and plant work we find ourselves 
very much in the dark with respect to the reliability of estimates of 
genetic parameters. Consider first the one-way classification under 
Model II: y;; = a; + €;; + » with the variances of the normally dis- 
tributed a,’s and ¢,;’s being o2 and o< respectively. We observe a 
class of N; individuals. Now the within class mean square will be 
distributed like x’ but the between class mean square, while independent 
of the former, will not have an ordinary x’ distribution when ¢2 is ¥ 0. 
In addition, Crump (1951) points out that = M, and = 
(M, — M,,)/N> are not maximum likelihood estimates of o% and 2 . 
N, is the coefficient of o2 in E(M,) and is equal to 1/(a — 1) [ZN,; — 
=Ni/=N;,]. Crump has derived the sampling variances of é and 
é2 as well as those of 2 and &; , the maximum likelihood estimates of 
o. and . He shows that V(é~), the variance of , approaches that 
of 2 , V(a), as the numbers in the classes increase, independently of 
a, and points out that V(é2) has a low efficiency relative to V(é2) 
when a, the number of classes, is small though the ratio V(¢2)/V(é2), 
proved to be so complex that he was unable to study its behavior. 
Tukey (1950) estimates o% by M,, , and «2 by 


and derives the sampling variances which, according to Crump have 
not yet been compared with V(¢2) and V(é2). 

Apparently no sampling variances have been derived in the un- 
balanced case for multiple classifications under Model II or the Mixed 
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Model and these are of course just what are needed if we are to estimate 
sampling variance for the estimate of the genetic variance, é¢ , obtained 
from an unbalanced classification corresponding to the scheme presented 
previously. An estimate c, ¢, + c.¢; of o¢ can be derived from that 
scheme but until we know something of the sampling variances we 
lack criteria for choosing those values of c, , and c, which will give 
the best estimate of h’. Osborne (1952) has published approximate 
sampling variances for several estimates of h? but as these are functions 
of the unknown sampling variances of the components é, , 6; and 6% 
his approximations could be poor in the commonly occurring unbalanced 
case. Comstock and Robinson in their experimental work on the 
consistency of estimats. of variance components in a balanced design, 
point out that their results cannot speak for other sorts of data. 

Wald (1940, 1941) gives a method for placing confidence limits on 
the ratio of any variance component to the error component for the 
unbalanced case in multiple classifications under Model II. For the 
one-way classification, for example, Wald showed that: 

2 


a—1l — 


2 _ Ta 
, where w,; = i+ Ne and = 
has the analysis of variance F distribution with a — 1 and N — a 
degrees of freedom. Thus the lower confidence limit is given by the 
root of the following equation in \’ 


i 
N-a Dw; | 


F = 


Wald shows that each of the two equations, one for F,, and one for 
F,, have at most one root in )” ; that if one equation has no root the 
corresponding confidence limit must be set equal to 0 and that if neither 
has a root we must reject one of the hypotheses: 


=~ ate; tu 

€;; and a, are normally and independently distributed 
Each ¢,;; has the same distribution 

Each a; has the same distribution. 


Solutions of such equations are obviously difficult in practice but if 
they are obtained for the particular types of lack of balance with which 
one is accustomed to work, one would have some idea of the accuracy 
of approximations he may be using. 

I should like to add in closing that an unbalanced classification in 
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which the subclass numbers are correlated genetically with the variate 
studied will give rise to further difficulties and that this is not an un- 
likely situation for certain traits. 
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FITTING THE NEYMAN TYPE A (TWO PARAMETER) 
CONTAGIOUS DISTRIBUTION 


J. B. DouGias 


School of Mathematics, 
N.S.W. University of Technology, Ausiralia. 


Introduction 


One of the difficulties associated with the use of the Neyman con- 
tagious distributions (Neyman (1939)) concerns the method of fitting 
to data. Restricting attention to the two parameter Type A distri- 
bution, the method suggested by Neyman (with a remark that its 
efficiency needed investigation: there are no sufficient estimators) was 
to equate the corresponding first and second moments of the data 
and the distribution—this gives two readily soluble equations for the 
two estimators. Shenton (1949) investigated the efficiency of this 
moment fit, and outlined a technique for an iterative maximum likeli- 
hood fitting process, together with suggestions relating to the circum- 
stances in which the process might be worth applying. Owing to the 
complicated nature of the recurrence relation for successive probabilities, 
the distribution is rather tedious to handle in any circumstances, and 
it is unfortunately the case that the maximum likelihood process 
suggested by Shenton increases considerably the labour of fitting. 
Recent papers (e.g., Beall and Rescia (1953)) stress the need for a 
technique which would reduce the amount of calculation—this paper 
suggests a method which greatly shortens the labour of obtaining a 
maximum likelihood fit, and which reduces the calculation necessary 
for a comparison of observation and expectation whatever the method 
of fitting used. As with the Shenton technique, the successive approxi- 
mations for the maximum likelihood fit are based on the Newton- 
Raphson method. 

The case where the zero class is unknown is also briefly discussed. 
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I The Complete Two Parameter Neyman Type A Distribution 
The probability of the occurrence of x individuals is given by 


0,2> 0, 


where yu and » are positive parameters (in Neyman’s 1939 paper written 
m, and m, respectively, but the suffixes are rather inconvenient when 
generalizations are not being considered). Successive probabilities are 
found from the recurrence relation 


(2) Pan = Pe: 


For a sample with observed frequencies of f, in the zero class, f, in 
the class with one member, --- , f, in the class with x members, --- , 
and power sums 


S, = Di a'f. , say, 


the summation being over all observed classes, the moment estimators 
Dn , Of v respectively are given by 


(3) Re = 


in a form convenient for desk machine computation*. 
The maximum likelihood estimators f, # are the solutions of the 
likelihood equations 


= S,/So = say, 


fim, S, ’ with = (x + DP ’ 


and effectively the procedure suggested by Shenton (1949) was to write 
F(v) > fam S, ’ 


5) 1 


where » was supposed eliminated through the first likelihood equation, 
and, with first approximations y, , », obtained from the moment equa- 


*Neyman (1939) apparently used an unbiassed second moment estimator. The difference is unim- 
portant for the large sample sizes here dealt with. : 
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tions, to determine second approximations y, , v. from 


Me = . 


(6) 


(Shenton’s equation (9) has a misprint in it: P,,, in the final term should 
be P 

Writing the expressions in the above form suggests what is in fact 
a fairly convenient way of proceeding—tabulating z, f, , P, , 7, and 
f., in turn enables one to calculate all the terms necessary and to 
maintain simultaneous control of the accuracy, with a reasonably 
complete check of the calculations. It might be pointed out that 
control of the accuracy needs some care, owing to the occurrence of 
ratios and subsequent differencing—it is necessary to carry many 
significant figures in the early stages (say, in the P,) in order to retain 
digits with meaning in the final stages for large values of x (say, in 
the x,); it is very easy to give entirely meaningless digits in the sums 
for both F(v) and F’(v). 

However, the process requires iteration, very often, and the labour 
involved is so considerable that this is not likely to be carried out for 
routine fitting. But it is possible to rewrite some of the preceding work, 
so that with the provision of suitable Tables the labour can be much 
reduced—the various P, need not be calculated, for example, until a 
direct comparison with the observed frequencies is required, and then 
only to the accuracy necessary for such a comparison in place of the 
extreme accuracy required as described above. 


For 
AX = 
P, =e’, 
Sty, 
r=0/s 
and writing 


so that yu! is the x-th power moment about the origin of a Poisson dis- 
tribution with parameter A, 


(8) P, = 


TF 
A 
AR 
oy 
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Given a table of ui , this means that P, can be calculated without 


knowledge of P,_; , P:-2 , «++ ; in practice, it turns out to be more 
convenient to tabulate 


9) Pest Pe 9 Say, 
(9) y 
whence 

(10) Post = Pe 


a recurrence relation involving only the immediately preceding prob- 
ability. 
The maximum likelihood equations are then 


pb = 
(11) 
0 


writing as before 


and exactly the procedure outlined above applies. However, given 
the Table of p, with interlinear values of g, , all that need be done is 
to*enter the table with the value of 


Va 


AL = 


from a moment fit and cumulate f,p, and f,q, , revised estimates of 
v and yu being obtained from 


vo = », — F(v,)/F’(v,) and , 


respectively. Iteration, until no change is produced in the estimates, 
requires only minutes, compared with hours for a single iteration for 
the previous process. 

To illustrate the procedure, the European Corn Borer data quoted 
by Neyman (1939) will be used, although in fact one would expect the 
moment fit to be of reasonably high efficiency (from Shenton’s results). 
Here the moment estimates for yu, v are 2.21, 1.48, giving an estimate 
for \ = we ” of 0.53. Writing the observed frequencies f, across a slip 
of paper in the positions corresponding to x = 0, 1, 2, --- 12 in the 
Table, cumulation of f,p, and f,q, leads to revised estimates for yp, v 
of 1.98, 1.60. These give a revised estimate for \ of 0.40, which leads 
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to a further pair of estimates for yu, v of 2.06, 1.54; use of these (with 
estimated by 0.44) leads again to the same pair of estimates. (No 
greater accuracy can be obtained from the Table.) 

The calculation of the P, is made from 


Py 
ane 


and the observed frequencies f, ate shown below with the expected 
frequencies ¢, = S,)P,. These may also be compared with the expected 
frequencies F, from the moment fit, and with the expected frequencies 
®, from a maximum likelihood fit with four iterations along the lines 
set out by Shenton (the final estimates of u, v are 2.063, 1.535). 


TABLE I 


w 
o 


fe 24 16 16 18 15 9 6 5 11 — 
oz 36.2 16-0} 18.1 | 10:1] 7.3 | | 10:7 | 1207 
22.3 | 16.8 | 18.4 | 16.5 | 13.4) 10.3] 7.5] 5.2] 9.6] 1.48 
23.8 | 16.2 | 18.0 | 16.1 | 18.2 | 10.2] 7.5 | 5.3) 9.7] 1.33 


(In each case, the x” has 6 degrees of freedom.) The discrepancies 
between the ¢, and ®, are small—since in fact the earlier classes are 
the more important (cf., e.g., Anscombe (1949), (1950)), the tendency 
for accumulation of errors for large values of x is probably unimportant. 


II The Truncated Two Parameter Neyman Type A Distribution 


In some circumstances, the zero class is either unreliable or entirely 
unknown, as for instance when one cannot be sure that all individuals 
not possessing some characteristic have been identified, or where the 
number of animals not trapped even once is quite unknown. (The 
circumstance of misclassificationis not considered—i.e., f, is suspect 
or unknown, but not f; , f2,-°-° .) 

The distribution then appropriate has probabilities P! related to 
the previous P, by 


(13) Re 
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and the moment estimators £,, , ?,, are given by* 


(14) j 
+ Ba) = 1, 
where 
= exp [—An(1 — 
and t 


Sl= zf, = say, 


z>0 
z’ = 


However, explicit solutions for @,, and #,, cannot be obtained, nor in 
fact do positive solutions (required by the physical problem) always 
exist. 

It is possible to use the “analogues” of the moment equations (3) 
in I: 


Because these quite commonly lead to negative estimates, their appli- 
cation is not considerable, except as giving some idea of reasonable 
first approximations for the processes described below, since it is in 
fact so easy to calculate their values. 

Following the Shenton procedure, a maximum likelihood fit for 
u and » leads in the present case to the pair of equations 


= Z’, 
(16) 1 — P, 
1— Pe’ = 


and writing 


(17a) Fo) =1- Pe" fame, 


*Obtained by J. H. Bennett (1950)—unpublished. 
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where yu is regarded as being eliminated (though this cannot be done 
explicitly) by using the former of this pair of equations, (16), 


(17b) = 1 — we ) 


1 — p(l — 
1 


VO1 


where again 


= (x + 1) =@+)>; 


Xe = — Te); 
and 


= P,/(1 — Py). 


As before, given first approximations », , »; , second approximations: 
M2 , v2 can be found from 


F(v,)/F’(), 
(18) 


Once more the labour is considerable, and the use of the device 
appropriate before leads to the equivalent maximum likelihood equations 


= 
(19) 1— P, 
with 
F(v) =1—-—Pe” pag 
(20) 


1 — pre 
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giving revised estimates as above. The calculations for >.’ f.p, and 
>.’ f.4. are again brief, though the substitution in the expression for 
F'(v) is worth arranging systematically and in any case takes some 
time*. 

If it is possible to make a reasonable assumption regarding the 
frequency in the zero class, a convenient procedure seems to be to use 
this “completed” sample to obtain moment estimators of u and v (as in J) 
and then to adjust the estimate of u to satisfy at least roughly the 
maximum likelihood equation 


(Given , , a second approximation to » can be found from 
S(us) 


where 
fl) = 1— — exp 


When it is not possible to proceed as above, either of the two earlier 
methods may be used to obtain first approximations, though it seems 
to be worth making some effort to try to satisfy at least roughly the 
maximum likelihood equation of this paragraph. 

Whatever the method, the expected frequencies ¢/ are found from 


= 
and if using the Table, from 


(where SP; = 8; 


As an illustration, unpublished data of leaf counts of Leucopogon 
Virgatus (supplied by Dr. D. W. Goodall, of the University of Mel- 


*Rather than to make repeated use of 
= Ve F (vr) /F’(%r), 


it may be preferable to retain a constant value for F’(v) once a reasonable estimate of » has been 
obtained, 
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bourne) will serve. There are a priori reasons for suspecting that the 
zero class is greatly inflated (but not at the expense of neighboring 
classes); the analysis above may therefore be applied. There are no 
(positive) moment estimators, so that the analogues (15) of the moment 
equations were used and then adjusted as suggested to give first approxi- 
mations for yu, »v of 0.293, 2.17. Use of equations (19) and (20) led to 
second approximations 1.10, 1.44 and then to third approximations 
1.49, 1.21 (for which F(v) = — 0.002, so that the limit of accuracy of 
the Table has practically been reached; the example is rather an awkward 
one, in that F’(v) is also rather small near the root of F(v)). These in 
turn give the expected frequencies ¢! shown: 


TABLE II. 
z 0 1 2 | 3 4 | 5 | 6 I+ 
Js (798) 70 41 33 29 11 7 11 
oz’ (109) 58.6 51.2 36.1 23.2 14.0 8.1 10.8 


and this leads to x’ = 6.8, with 4 degrees of freedom—a reasonable fit. 
(It is interesting, though perhaps not unexpected, that while estimation 
with inclusion of the zero class reproduces f, for x = 0 rather well, the 
divergence for x > 0 is considerable—above, as was anticipated a priori 
the divergence is largely at x = 0.) 


III Tables of p, and q, 


These give ratios of Poisson power moments about the origin, up 
to order 20. More precisely, for 


then 


and p, is tabulated for 
x = 0(1)19 and » = 0.000(0.001)0.03(0.01)0.3(0.1)3. 


Corresponding values of g, = p.(p.11 — p:) are also shown, in each 
case with rather more decimals for small values of x, for various reasons. 
(E.g., the number of significant digits thus alters less; f, for small 
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values of x is relatively large, in many cases, so that more digits are 
needed when the sum ), f,p, is required to a fixed number of decimals; 
and linear interpolation is adequate to more decimals for small z, this 
being the chief factor in deciding at what points to truncate the tabulated 
values.) 

_ The horizontal lines across the Tables show the regions above which 
linear interpolation (with respect to \) for p, or q, is hardly adequate 
to the accuracy of the Tables. (Some “something” of these lines has 
been carried out—they are approximate rather than exact.) Where 
such regions are appropriate, or where greater accuracy than the Tables 
afford may be required, the Tables can still be used to obtain corrected 
first approximations for use in the direct Shenton procedure (equations 
(4), (5) and (6); or (16), (17) and (18)). 
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APPENDIX—TABLE I 
0.0 0.000 0.000 1.000 0.000 1.000 0.000 1.00 0.000 
0.1 | 0.100 1.100 1.191 1.34 
0.100 0.100 0.183 0.300 
0.2 | 0.200 1.200 1.367 1.61 
0.200 0.200 0.339 0.491 
0.3 | 0.300 1.300 1.531 1.84 
0.400 0.400 0.604 0.765— 
0.5 0.500 1.800 1:888 2.28 
0.6 | 0.600 1 600 1.975+ 2.40 
0.600 0.600 0.834 0.989 
0.7 | 0.700 1.700 2.112 2.56 
0.700 0.700 0.942 1.095-+ 
0.8 | 0.800 1.800 2.244 2.71 
1.198 
0.900 | 0.900 1.149 1.299 
1.0 | 1.000 2-000 2-500 3.00 
1.1 | 1.100 2.100 2.624 3.14 
1.100 1.100 1.349 1.500 
1.2 200 2.200 2.745+ 3.97 
1.200 1.200 1.448 1.599 
1.3. | 1.300 2.300 2.865+ 3.40 
1.400 ~ 1.400 1.643 1.795+ 
1.6 | 1.600 2.600 3.215+ 3.787 
1.600 1.600 1.837 1.991 
1.7 | 1.700 2.700 3.330 3.910 
1.700 1.700 1.933 2.088 
1.8 | 1.800 2.800 3.443 4.032 
1.900 1.900 9.196 2.283 
2.1 | 2.100 3.100 3.777 4.391 
2.100 2.100 2.319 2.477 
2.2 | 2.200 3.200 3.888 4.509 
2.200 2.200 2.415— 2.574 
2.3 | 2.300 3.300 3.997 4.625+ 
2.400 2.400 2.608 2.768 
2.5 | 2.500 3.500 4.214 429, 4886 
2.6 | 2.600 3.600 4.322 4.970 
2.600 2.600 2.801 2.962 
2.7 | 2.700 3.700 4.430 5.084 
2.700 2.700 2.897 3.058 
2.8 | 2.800 3.800 4.537 5.197 
2.900” 2.900 3.098 3.252 
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APPENDIX—TABLE I (Continued) 
Ps qs Ps Ps qs P7 
0.0} 1.00 0.000 1.00 0.000 1.00 0.00 1.00 0.00 
0.1} 1.57 1.84 2.11 2.36 
9.49) 0.498 0.53 0.55+ 
0.2] 1.92 2.23 2.53 2.80 
; 0.601 0.658 0.70 0.75+ 
0.3} 2.19 2.53 2.84 3.14 
0.4| 2.42 0.737 2.77 0.797 3.11 0.86 3.43 0.92 
0.5) 2.62 0.975+ 2.99 1.053 3.35— 1.18 3.69 1.21 
0.6) 2.81 3.20 3.56 3.92 
1.087 1.174 1.26 1.35— 
0.7 | 2.99 3.39 3.77 4.14 
1.196 1.291 1.38 1.47 
0.8} 3.15+ 3.57 3.96 4.34 
3.31 1.305-— 374 1.405— 4.14 1.50+ 4.53 1.59 
1.0| 3.47 1515+ 3.90 1.626 4.32 1.73 4.72 1.83 
1.1] 3.62 4.06 4.49 4.90 
1.619 1.733 1.84 1.94 
1.2| 3.76 4.22 4.66 5.07 
1.722 1.840 1.95+ 2.05+ 
1.3] 3.90 4.37 4.82 5.24 
1.4| 4.04 1.824 4.52 1.944 4.97 2.06 5.41 2.17 
1.5) 4.178 2.027 4.663 2.152 5.12 2.97 5.57 2.38 
1.6 | 4.312 4.806 5.27 5.73 
2.127 2.255— 2.38 2.49 
1.7 | 4.444 4.945+ 5.42 5.88 
2.227 2.357 2.48 2.60 
1.8 | 4.574 5.083 5.57 6.03 
1.9| 4.703 2.327 5.219 2.458 5.71 2.58 6.18 2.70 
2.425+ 2.559 2.69 ‘ 2.81 
2.0} 4.830 2.524 5.352 2.660 5.85— 2.79 6.33 2.91 
2.1 | 4.955+ 5.485— 5.99 6.47 
2.623 2.760 2.89 3.01 
2.2| 5.080 5.615+ 6.12 6.61 
2.721 2.860 2.99 3.12 
2.3 | 5.203 5.745— 6.26 6.75+ 
0.4] 5.325— 2.819 5.873 2.959 6.39 3.09 6.89 3.22 
2.5| 5.446 3.015+ 6.000 3.158 6.53 3.29 7.03 3.42 
2.6| 5.566 6.125+ 6.66 
3.113 3.257 3.39 3.52 
2.7| 5.6854+ 6.250 6.79 7.30 
3.211 3.356 3.49 3.63 
2.8) 5.804 6.374 6.92 7.44 
2.9| 5.922 3.309 6.497 3.454 7.04 3.59 7.57 3.73 
3.0] 6.039 3.504 6.619 3.650-+ 3.79 7.70 3.93 
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APPENDIX—TABLE I (Continued) 
Ps Po Pio Pi 
0.0; 1.00 1.00 
0.1} 2.59 2.82 
0.2| 3.07 0.81 3.34 
0.3 | 3.44 0.99 3.72 
3.74 1.14 4.05- 
0.5| 4.01 4.34 
4.26 4.60 
0.7 | 4.49 155+ 4.84 
0.8] 4.71 1.68 5.06 
0.9] 4.91 1.80 5.28 
1.0} 5.11 5.48 
1.1} 5.30 5.68 
1.2] 5.48 215+ 5.87 
1.3) 5.66 9.97 6.06 
1.4] 5.83 2.38 6.24 
1.5] 6.00 6.41 
1.6] 6.16 6.58 
1.7 6.22 2.71 6.75— 
1.8 | 6.48 2.81 6.91 
1.9} 6.63 2.92 7.07 
2.0| 6.79 7.23 
2.1] 6.94 7.39 
2.2| 7.08 3 94 7.54 
2.3] 7.23 3.34 7.69 
2.4| 7.37 3.44 7.84 
2.5| 7.52 7.99 
2.6| 7.66 8.14 
2.7| 7.80 3.754 8.28 
7.94 3.85+ 8.42 
2.9] 8.07 3.96 8.56 
3.0] 8.21 4.06 8.70 
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0.00 


0.76 
1.01 
1.22 
1.39 


1.56 
1.71 
1.86 
2.00 
2.13 


2.26 
2.39 
2.51 
2.64 
2.76 


2.88 
3.00 
3.11 
3.23 
3.34 


3.45+ 


3.56 
3.68 
3.79 
3.89 


4.00 
4.11 
4.22 
4.33 
4.43 


4.54 


0.00 
0.87 
1.15— 
1.37 
1.56 


1.74 
1.90 
2.06 
2.21 
2.35+ 


2.49 
2.63 
2.76 
2.89 
3.02 


3.14 
3.26 
3.39 
3.51 
3.62 


3.74 
3.86 
3.97 
4.09 
4.20 


4.31 
4.43 
4.54 
4.65— 
4.76 


4.87 


162 
que Pis ais Pia Pis = 
1.00 1.00 1.00 1.00 | 
4 
0.1| 3.49 3.71 3.93 4.14 
0.2| 4.10 435+ 450 4.88 
0.3| 4.55+ 4.82 cae 5.08 an 5.34 
0.4| 4.93 5.21 5.76 
‘ 0.5| 5.26 5.55+ 5.85— 6.13 
5.55+ 5.86 6.17 6.46 
7 0.7| 5.83 6.15— 1.93 6.46 1.99 6.77 | 
0.8| 6.09 6.41 6.74 7.08+ 
0.9| 6.33 6.67 7.00 7.82 
te 1.0] 6.56 6.91 7.24 7.58 
1.1| 6.78 7.13 7.48 7.82 
1.2| 7.00 7.36 7.71 8:08 
7.20 7.57 7.93 8.28 
1.4] 7.40 7.97 
1.5] 7.60 7.98 8.35—- 8.71 
1.6] 7.79 8.17 8.55— 4 8.92 
1.7| 7.97 8.36 8.74 339 12 
1.8] 8.15+ 8.94 «9.82 
1.9| 8.33 8.73 9.12 
. 2.0| 8.50+ 8.91 9.31 9.70 
2.1| 8.67 9.09 9.49 9.89 
2.2| 8.84 9.26 see 9.67 3 19-07 
2.3| 9.01 9.43 9.84 
2.4| 9.17 9.60 10.01 419 10-42 
- 2.5| 9.33 9.76 10.18 10.60 
2.6| 9.49 9.93 spr 10.35+ mend 10.77 | 
2.7| 9.65— 10.09 10.52 443. 10.94 | 
2.8| 9.81 10.68 11.10 
2.9| 9.96 10.40 10.84 11.27 
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APPENDIX—TABLE I (Concluded) 


Pris que 
0.0} 1.00 
0.1} 4.35+ 
0.2} 5.07 
0.3} 5.60 1.42 
0.4| 6.04 162 
0.5} 6.42 
0.6| 6.76 
0.7| 7.07 2.12 
7.37 2.98 
0.9} 7.64 49 
1.0} 7.91 56 
1.1} 8.16 2.70 
1.2] 8.40 
2.84 
8.63 
1.4] 8.86 
3.10 
1.5] 9.07 
1.6| 9.29 
1.7| 9.49 3.47 
1.8] 9.69 3.59 
1.9} 9.89 3.71 
2.0} 10.09 
10.46 4.07 
2.3 | 10.65— 4.18 
2.4 | 10.83 4.30 
2.5] 11.00 
2.6 | 11.18 
2.7 | 11.35+ 4.64 
2.8 | 11.52 4.75- 
2.9 | 11.69 4.86 
3.0} 11.86 


4.97 


a 
163 
Qiz Pis 
1.00 is Pis 
4.56 0.00 1.00 
5.31 ° 0.94 4.77 0.00 1.00 
5.85+ 1.23 5.54 0.98 _ 4.97 ea 
6.30 1.47 6.10 1.28 5.77 cir 
6.70 1.72 6.83 
7:05— 1.85+ 6.97 
7.37 2.03 7.34 1.91 7.25— 
7.68 2.19 7.67 2.09 7.62 - 
7.96 2.34 7.98 2.25— 7.96 
2.49 8.27 2.41 8.28 . 
8.23 2.56 8.58 
8.49 2.64 8.55+ 
8.74 2.78 8.82 2.71 8.87 Ty 
9.21 3.05- 2.99 9.40 
3.18 9.55+ 3.12 9.65— ha 
9.43 
965- 9.78 
9.86 3.43 10.00 3.39 «10-18 
10.07 3.56 10-22 3.52  10-35+ 
10.27 3.68 10.43 3.64 10-58 aA 
g.g9 10.64 77 10-78 
10.47 
10.66 3.92 10.84 ae 
10.354 11.04 11.21 
11.04 11-28 aag at! 
11.22 4.28 11.43 4,.25+ 11.61 
ass 
11.41 4.49 12-00 
11.58 4.5, 11-80 
11.76 4.62 11-98 461 12:19 
11.94 474 12-16 a72 
4.96 12.52 12.74 
5.07 12.69 vel 
5.18 19:10 
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0.00 |0.0000 1.0000 0.0000 1:0000 1.000 
0.01 |0.0100 1.0100 1.0199 1.039 

0.0100 0.0100 0.0198 0.039 
0.02 |0.0200 1.0200 1.0396 1.077 

0.0200 0.0200 0.0392 0.075— 
0.03 |0.0300 1.0300 1.0591 1.114 
0.04 10.0100 99300 9.0300 _0:0583 2-200 

0.05 |0.0500 0.9500 1:0500 1.0976 
0.06 |0.0600 1.0600 1.1166 1.218 

0.0600 0.0600 0.1134 0.199 
0.07 |0.0700 1.0700 1.1354 1.251 

0.0700 0.0700 0.1311 0.226 
0.08 |0.0800 1.0800 1.1541 1.283 

‘ 0.0900 0.0900 0.1658 0.276 
0.10 |0.1000 0.1000 1:1000 1-1909 
0.11 {0.1100 1.1100 1.2091 1.374 

0.1100 0.1100 0.1993 0.322 
0.12 |0.1200 1.1200 1.2271 1.403 

0.1200 0.1200 0.2157 0.344 
0.13 |0.1300 1.1300 1.2450+ 1.431 
0.14 |0.1400 9-1300 0.1800 92318 1459 0-365- 

0.1400 0.1400 0.2477 0.384 
0.15 {0.1500 0.1500 1:1500 1.2804 1486 
0.16 |0.1600 1.1600 1.2979 1.513 

0.1600 0.1600 0.2789 0.422 
0.17 {0.1700 1.1700 1.3153 1.539 

0.1700 0.1700 0.2942 0.440 
0.18 {0.1800 1.1800 1.3225+ 1.565— 
0.19 10.1900 92809 "i999 90-1800 0.3093 «(0-458 

0.20 |0.2000 0.2000 1:2009 1-3667 1.615- ao, 
0.21 |0.2100 1.2100 1.3836 1.639 

0.2100 0.2100 0.3534 0.508 
0.22 |0.2260 1.2200 1.4003 1.663 

0.2200 0.2200 0.3678 0.523 
0.23 |0.2300 1.2300 1.4170 1.687 
0.24 10.2400 0-200 143354 0-8820 i719 (0-539 

0.25 |0.2500 0.2500 1:2500 1.4500 
0.26 10.2600 1.2600 1.4663 1.755+ 

0.2600 0.2600 0.4238 0.583 
0.27 |0.2700 1.2700 1.4826 1.778 

0.2700 0.2700 0.4374 0.597 
0.28 |0.2800 1.2800 1.4988 1.800 
0.29 10.2900 92800 90-2800 15148 9480935, 0.611 

0.2900 0.2900 ~~ 0.4643 0.625+ 
0.30 |0.3000 0.3000 1-3000 1-5308 0.47754 1-848 9.939 


: 


NEYMAN TYPE A DISTRIBUTION 
APPENDIX—TABLE II (Continued) 
0.00 | 1.000 1.00 1.00 
0.01 | 1.077 1.15- 1.26 
0.073 0.13 0.21 
0.02 | 1.147 1.26 1.44 
0.135— 0.22 0.31 
0.03 | 1.212 1.37 1.58 
0.05 | 1.329 1.53 1.78 
0.06 | 1.382 1.61 1.86 
0.309 0.40 0.45+ 
0.07 | 1.432 1.67 1.93 
0.341 0.43 0.47 
0.08 | 1.479 1.73 1.99 
0.09 | 1.524 1.78 2.05+ 
0.10 | 1.567 1.84 2.11 
0.11 | 1.608 1.88 2.16 
0.443 0.52 0.55— 
0.12 | 1.648 1.93 2.21 
0.464 0.53 0.56 
0.13 | 1.686 1.97 2.25+ 
0.15 | 1.758 2.05+ 2.34 
0.16 | 1.792 2.09 2.38 
0.538 0.60 0.63 
0.17 | 1.825+ 2.13 2.42 
0.555— 0.61 0.65+ 
0.18 | 1.857 2.16 2.46 
0.19 | 1.889 6.571 2.20 0.68 2.49 0.67 
0.20 | 1.919 9.001 2.23 oes 2.53 0.70 
0.21 | 1.949 2.26 2.56 
0.616 0.67 0.72 
0.22 | 1.978 2.30 2.60 
0.631 0.69 0.73 
0.23 | 2.006 2.33 2.63 
| 0.645— 0.70 0.75- 
0.25 | 2.061 2.39 2.69 axe 
0.26 | 2.088 2.42 2.72 
0.685+ 0.74 0.80 
0.27 | 2.114 2.44 2.75+ 
0.699 0.76 0.81 
0.28 | 2.139 2.47 2.78 
0.29 | 2.165—- 2.50— 2.81 0.86 
- 0.724 0.78 0.84 
0.30 | 2.189 0.737 2.53 0.80 2.84 0.96 


165 
1.00 0.00 
1.43 
1.65+ 
0.36 
1.81 
1.92 0.39 

0.42 
0.44 
2.10 

0.46 
2.17 

0.49 
2.24 
2.30 0.51 

0.53 
2.41 

0.57 
2.46 
0.60 
2.51 
2.56 0.62 

0.64 
2.60 ge 
2.65- 

0.68 
2.69 
0.70 a 

2.73 

0.74 

th 

2.84 

0.77 

2.88 eft 

0.79 bay 

2.91 1 

0.82 

0.86 
8.05- 
3.08 
3.11 0.89 

0.91 
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APPENDIX—TABLE II (Continued) 


0.00 | 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 ( 
0.01 | 1.63 1.83 2.01 ——— 2.16 ( 
0.33 ———. 0.32 0.30 9.30 
0.02 | 1.87 2.07 2.24 2.40 ( 
0.36 0.35+ 0.36 0.39 
0.03 | 2.02 2.22 2.39 2.57 i 
0.04 | 2.14 0.30 2.34 O80 - 2.52 ae 2.71 pages 0 
0.05 | 2.24 cee 2.44 nas 2.63 0.52 2.83 ase 0 
0.06 | 2.32 2.53 2.73 2.93 0 
0.48 0.51 0.56 0.60 
0.07 | 2.40 2.61 2.82 3.03 0 
0.51 0.55- 0.59 0.63 
0.08 | 2.47 2.68 2.90 3.12 0 
0.09 | 2.53 0.64 2.75+ paged 2.98 O58 3.20 0.67 0 
0.10 | 2.59 2.82 3.05— 0.60 3.27 0 
0.11 | 2.65-— 2.88 3.11 3.34 0 
0.62 0.67 0.71 0.75+ 
0.12 | 2.70 2.94 3.18 3.41 0 
0.64 0.69 0.74 0.78 
é 0.13 | 2.76 3.00 3.24 3.47 0 
0.15 | 2.85+ 0.71 3.10 0.76 3.35 0.81 3.59 0.85+ 
0.16 | 2.90 3.15+ 3.40 3.65— 0 
0.73 0.79 0.83 0.88 
0.17 | 2.95-— 3.20 3.45+ 3.70 0 
0.75+ 0.81 0.86 0.90 
0.18 | 2.99 3.25— 3.50+ 3.75+ 0. 
0.19 | 3.63 0.77 3.20 0.83 5 554 0.88 3.80 0.92 0 
0. 
0.20 | 3.07 9.81 3.34 0.87 3.60 oes 3.85+ 0.97 
0.21 | 3.11 3.38 3.64 3.90 : 0. 
0.83 0.89 0.94 0.99 
0.22 | 3.15+ 3.42 3.69 3.95— 0. 
0.85+ 0.91 0.96 1.01 
0.23 | 3.19 3.46 3.73 3.99 0. 
0 
0.25 | 3.26 0.00 3.54 aes 3.81 2. 4.08 1.07 
0.26 | 3.30 3.58 3.85+ 4.12 0. 
0.92 0.98 1.03 1.09 
0.27 | 3.34 904 3.62 a. 3.89 105+ 4.16 “rr 0. 
0.28 | 3.37 3.65+ 3.93 ‘ 4.20 0 
| 0.96 1.02 3.97 1.07 1.12 0 
. 0. 
0.30 |°3.44 0.90 3.72 4.01 4.28 148 


= 
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APPENDIX—TABLE II (Continued) 


Pus qi2 dis qia 
0.00 | 1.00 
0.01 | 2.30 2.44 2.59 

0.32 0.36 0.40 
0.02 | 2.56 2.73 2.90 

0.43 0.47 0.49 
0.03 | 2.75— 2.93 3.11 

0.55+ 0.58 0.60 
0.06 | 3.14 3.34 3.54 

0.63 0.66 0.69 
0.07 | 3.24 3.45— 3.65— 

0.67 0.70 0.73 
0.08 | 3.33 3.54 3.75— 
0.09 | 3.42 6.79 a 

0.11 | 3.57 3.79 4.01 

0.79 0.83 0.87 
0.12 | 3.64 3.86 4.09 

0.82 0.86 0.90 
0.13 | 3.70 3.93 4.16 
0.14 | 3.77 085- 400 939 ~—0..98 

0.87 0.91 0.95+ 
0.15 | 3.83 
0.16 | 3.89 4.13 4.36 

0.92 0.96 1.01 
0.17 | 3.94 4.18 4.42 

0.94 0.99 1.03 
0.20 | 4.10 4.35+ 4.59 
0.21 | 415+ am CO 

1.03 1.08 1.13 

0.26 | 4.39 4.65— 4.90 

1.14 1.19 1.24 
0.27 | 4.43 4.69 4.95— 

1.16 1.21 1.26 
0.28 | 4.47 4.99 


167 
4 
dis a 
1.00 
2.74 

0.43 
3.07 
6.5) 4 
3.29 
3.46 0.57 

0.63 
3.61 
3.73 

0.72 
3.85— 

0.76 
3.95+ 
4.05+ 0.80 

0.84 
4.14 
4.22 

0.90 
4.30 

0.93 
4.38 
4.45+ 
4.52 
4.59 

1.07 
4.72 

1.12 
4.89 

1.17 
4.95— 
5.00+ 

5.05+ 
5.10 
as 

1.29 
5.20 
5.254 
5.30 1.33 

1.35— 
5.34 
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APPENDIX—TABLE II (Concluded) 


Pis die Piz Pis Pis 
0.00 | 1.00 1.00 1.00 1.00 
0.01 | 2.90 3.05+ 3.20 3.35— 
0.03 | 3.46 0.60 3.64 0.62 3.81 0.65— 3.98 
0.04 | 3.64 0.66 3.82 0.68 4.00 0.71 4.18 
0.05 | 3.80 3.98 4.17 4.35+ 
0.06 | 3.93 4.12 4.31 4.50-— 
0.07 | 4.05-— 0.80 4.24 0.83 4.44 0.86 4.63 
0.08 | 4.16 0.84 4.36 0.87 4.56 0.90 4.75+ 
446 99, 467 go, 4:87 
0.10 | 4.35+ 4.56 4.77 4.97 
0.11 | 4.44 4.65+ 4.86 5.07 
0.12 | 4.52 0.97 4.74 01 4.95+ 1.04 5.16 
0.13 | 4.60 1.00 4.82 ped 5.03 1.08 §.25— 
0.14 | 4.68 1.03 4.90 1.07 5.12 1.11 5.33 
0.15 | 4.75-— 4.97 5.19 5.41 
0.16 | 4.82 5.04 §.27 5.49 
0.17 | 4.88 1.12 5.11 1.16 5.34 1.20 5.56 
0.18 | 4.95-— 1.14 5.18 1.18 5.41 1.22 5.63 
0.19 | 5.01 1.17 5.24 1.21 5.48 1.25- 5.70 
0.20 | 5.07 5.31 5.54 5.77 
0.21 | 5.13 5.37 5.60 5.84 
0.22 | 5.19 1.24 5.43 1.28 5.66 1.33 5.90 
0.23 | 5.24 1.26 5.49 1.31 5.72 135+ 5.96 
0.24 | 5.30 1.29 5.54 1.33 5.78 1.38 6.02 
0.25 | 5.35+ 5.60 5.84 6.08 
0.26 | 5.40 5.65— pen 5.89 = 6.14 
0.27 | 5.45+ 1.35+ 5.70 1.40 5.95— 1.45— 6.19 
0.28 | 5.50+ 1.38 5.75+ 1.42 6.00 1.47 6.25— 
0.29 | 5.55+ 1.40 5.80 1.45— 6.05+ 1.49 6.30 
0.30 | 5.60 1.42 5.85+ 1.47 6.10 1.51 6.35+ 


= 
| 
( 
| 
| 
0. 
0. 
0. 
0. 
0. 
0. 
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APPENDIX—TABLE III 


0.000 .000 00 1.000 00000 1.000 90000 1.000 000 
0.001 .001 1.001 1.002 1.004 

.001 -00100 .00199 .0040 
0.002 .002 1.002 1.004 1.008 

.002 .00200 -00399 .0079 
0.003 .003 1.003 1.006 1.012 
0.004 004 .003 1.004 .00300 1.008 -00599 1.016 .0119 

.004 -00400 .00797 .0158 
0.005 -005 005 1.005 00500 1.010 009954 1.020 0197 
0.006 | .006 1.006 1.612 1.024 

.006 -00600 .01193 .0235+ 
0.007 .007 1.007 1.014 1.028 

.007 -00700 .01390 .0273 
0.008 .008 1.008 1.016 1.032 
0.009 009 .008 1.009 -00800 1.018 .01587 1.035-+4 .0311 

0.010 | .010 010 1.010 01000 1.020 01981 1.039 0387 
0.011 O11 1.011 1.022 1.043 

011 .01100 .02157 .0424 
0.012 -012 1.012 1.024 1.047 

.012 .01200 .02371 .0461 
0.013 1.013 1.026 1.051 
0.014 014 .013 1.014 .01300 1.028 .02567 1.055— .0497 

.014 -01400 7 .02762 .0534 
0.015 | .015 015 1.015 01500 1.030 029554 1.058 0570 
0.016 .016 1.016 1.032 1.062 

.016 -01600 .03150+ .0606 
0.017 .017 1.017 i 1.034 1.066 

.017 .01700 03343 .0642 
0.018 -018 1.018 1.036 1.070 
0.019 019 .018 1.019 -01800 1.038 .03537 1.074 .0678 

0.020 .020 020 1.020 02000 1.040 03922 1.077 0748 
0.021 .021 1.021 1.042 1.081 

.021 .02100 .04114 .0783 
0.022 .022 1.022 1.044 1.085— 

.022 .02200 ‘04306 .0818 
0.023 .023 1.023 1.045+ 1.089 
0.024 024 .023 1.024 .02300 1.047 .04998 1.092 .0852 

0.025 .025 025 1.025 02500 1.049 04880 1.096 0920 
0.026 .026 1.026 1.051 a 1.100 

.026 -02600 .05070 .0954 
0.027 .027 1.027 = 1.053 is 1.103 

.027 -02700 -05260 .0988 
0.028 .028 1.028 1.055+ 1.107 
0.029 029 .028 1.029 .02800 1.057 .05449 Lil .1021 

.029 ‘ -02900 .05639 .1054 
0.030 | .030 030 1.030 03000 1.059 05827 1.114 1.087 


| 4 
ak 
+ 
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Ps 


Pe 


qs q7 
0.000 | 1.000 000 1.000 000 1.00 000 1.00 000 
0.001 | 1.008 1.016 1.03 1.06 
.0079 016 .031 .058 
0.002 | 1.016 1.031 1.06 1.12 
.0157 .031 .058 105+ 
0.003 | 1.024 1.046 1.09 1.17 
0.004 | 1.031 0283s 061 1.21 
0.005} 1.039 1.076 1.14 125+ 49, 
0.006 | 1.047 1.090 1.17 1.29 
.0455— .085— 147 .227 
0.007 | 1.054 1.104 1.19 1.33 
097 165+ .247 
0.008 | 1.062 1.118 1.22 1.36 
.0665- .120 197 .278 
0.010 | 1.077 a 
0.011 | 1.084 1.157 1.28 1.45+ 
.0799 141 224 .302 
0.012 | 1.091 1.170 1.30 1.48 
.0864 151 236 311 
0.013 | 1.098 1.183 1.32 1.51 
0.014) 1.105+ “8978 19,  -819 
0.015 | 1.112 1.207 1.36 155+ 
0.016 | 1.119 1.219 1.37 1.58 
188 277 341 
0.017 | 1.126 1.231 1.39 d 160 ° 
1175- 197 286 346 
0.018 | 1.133 1.242 1.41 1.62 
0.020 | 1.147 1.264 1.44 1.65+ 
316 .365- 
0.022 | 1.160 1.286 1.47 1.69 
1461 .235- .368 
0.023 | 1.167 1.297 1.48 1.70 
0.024 | 1.173 1516 1.307 
0.026 | 1.186 1.328 1.52 1.75+ 
.1676 261 .345— .382 
0.027 | 1.193 1.338 1.54 1.77 
1727 .268 .350— .385— 
0.028 | 1.199 1.347 1.55+ 1.78 
0.029/1.205+ 1778 1.357 1.56 
0.030 | 1.212 1.367 1-88 1.81 


a, 


4 4 
| 
i 
0. 
0. 
0. 
0. 
0. 
0.1 
° 
t 
ij 
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APPENDIX—TABLE III (Continued) 

qs Po me qio qu 
0.001 | 1.12 1.21 1.35— 1.53 

104 170 249 ang 
0.002 | 1.21 1.35+ 1.53 1.72 
173 246 288 278 
0.003 | 1.29 1.46 1.65+ 1.83 
253 301 263 
0.006 | 1.47 1.67 1.86 2.02 
315+ 294 268 
0.007 | 1.52 1.72 1.90 2.06 
308 319 295- 275+ 
0.008 | 1.56 1.76 1.94 2.10 
0.009 | 1.60 “28 321 -296 
323 * 291 
0.011 | 1.66 1.86 2.04 2.19 
337 326 307 310 
0.012 | 1.69 1.89 2.07 2.22 
341 329 312 319 
0.013 | 1.72 1.92 2.09 2.24 
348 333 323 338 
0.016 | 1.79 1.99 2.16 2.31 
354 339 356 
0.017 | 1.81 2.01 2.18 2.34 
356 342 341 364 
0.018 | 1.83 2.03 2.20 2.36 
0.019 | 1.85+ oo5—- “346 ggg «378 
361 349 353 381 
0.021 | 1.89 2.08 2.254 2.42 
366 357 366 397 
0.022 | 1.91 2.10 2.27 2.44 
369 360 372 405— 
0.023 | 1.92 2.12 2.29 2.454 
374 368 384 420 
0.085 | 105+ 282 240 
0.026 | 1.97 2.16 2.34 2.51 
379 377 396 434 
0.027 | 1.98 2.18 2.35+ 2.52 
382 381 402 440 
0.028 | 2.00 2.19 2.37 2.54 
0.029 “84 909 -385- 95, -408 -447 
0.030 | 2.02 59g 2.22 ggg 


= 
io Jie 
Were 
“4 = 
| 
] 
3 Rs. 
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APPENDIX—TABLE III (Continued) 


0.000 1.00 000 1.00 000 1.00 000 1.00 000 
0.001 Bef | 1.86 1.98 2.07 

.264 220 .184 175- 
0.002 1.88 2.01 2.21 2.21 
.239 209 .207 230 
0.003 1.98 2.10 2.21 2.32 
0.004 2.05— 2.17 2.28 2.40 
4 .241 247 .276 316 
0.005 2.10 253 2.22 270 2.34 305— 2.47 345— 
0.006 2.15+ 267 2.27 291 2.40 399 2.54 368 
0.007 2.19 981 2.32 311 2.45+ 351 2.60 387 
0.008 2.23 2.36 2.50 2.65+ 
0.009 2.27 2.40 2.55— 2.70 
.310 346 ‘ .386 417 
0.010 2.30 323 2.44 362 2.59 400 2.74 429 
0.011 2.33 2.48 2.63 2.78 
.337 376 413 440 
0.012 2.36 2.51 2.66 2.82 
.349 389 .425— 450- 
0.013 2.39 2.54 2.70 2.86 
0.014 2.42 2.57 2.73 2.89 
0.015 2.44 383 2.60 423 2.76 455— 2.93 476 
0.016 2.47 2.63 2.79 2.96 
.393 432 463 484 
0.017 2.49 2.65+ 2.82 2.99 
.403 442 A471 491 
0.018 2.52 2.68 2.85— 3.02 
0.019 2.54 2.70 2.87 3.04 
0.020 2.56 430 2.73 467 2.90 494 3.07 513 
0.021 2.58 2.75+ 2.92 3.09 
.438 474 501 519 
0.022 2.60 2.77 2.95— 3.12 
.446 481 507 526 
0.023 2.62 2.79 2.97 3.14 
0.024 2.64 2.82 2.99 3.16 
0.025 2.66 467 2.84 501 3.01 526 3.19 545— 
0.026 2.68 474 2.86 507 3.03 532 3.21 551 
0.027 2.70 j 2.87 3.05+ ‘ 3.23 
481 514 537 557 
0.028 2.71 2.89 3.07 3.25— 
0.029 2.73 2.91 3.09 3.27 
0.030 2.75-— 499 2.93 531 3.11 554 3.29 75- 
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0.000 | 1.00 1.00 1.00 1.00 
0.001 | 2.16 = 2.25— 2.35- = 2.46 
0.002 | 2.32 poe 2.43 pos 2.56 po 2.69 
0.003 | 2.44 por 2.57 2.70 2.84 
0.004 | 2.53 2.67 2.81 2.95+ 
0.005 | 2.61 2.76 2.90 3.04 
0.006 | 2.68 pons 2.83 yr 2.98 pe 3.12 
0.007 | 2.75—- 2.90 pri 3.04 pe 3.19 
0.008 | 2.80 pl 2.95+ 3.10 3.24 
0.009 | 2.854 3.01 3.15+ 3.30 
0.010 | 2.90 3.05+ 3.20 3.35— 
0.011 | 2.94 \e 3.10 = 3.25— = 3.40 
0.012 | 2.98 pon 3.14 3.29 3.44 
0.013 | 3.02 3.18 3.33 3.48 
0.014 | 3.06 een 3.21 ph 3.37 pies 3.52 
0.015 | 3.09 3.25- 3.40 3.56 
0.016 | 3.12 pox 3.28 mg 3.44 ps 3.59 
0.017 | 3.15+ ‘oan 3.31 ane 3.47 yee 3.63 
0.018 | 3.18 po 3.34 3.50 3.66 
0.019 | 3.21 3.37 3.53 3.69 
0.020 | 3.24 3.40 3.56 3.72 
0.021 | 3.26 3.43 3.59 
0.022 | 3.29 3.45+ 3.62 3.78 
0.023 | 3.31 pr 3.48 “a 3.64 yond 3.81 
0.024 | 3.33 3.50 3.67 3.83 
0.025 | 3.36 3.53 3.69 3.86 
0.026 | 3.38 pon 3.55— yo 3.72 poe 3.88 
0.027 | 3.40 3.57 3.74 3.91 
0.028 | 3.42 3.59 3.76 ‘ae 3.93 
0.029 | 3.44 3.61 3.95+ 
0.030 | 3.46 or 3.64 pe 3.81 -_ 3.98 
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USE OF THE SIMPLEX DESIGN IN THE STUDY 
OF JOINT ACTION OF RELATED HORMONES 


P. J. CLARINGBOLD 


Department of Veterinary Physiology. 
University of Sydney, N.S.W., Australia 


INTRODUCTION 


In certain toxicological studies the term joint action has come to take 
a special meaning. If members of a group of related compounds all 
cause death of an organism when administered separately the simul- 
taneous action of these substances is called their joint action. Bliss 
(1939) first discussed the analysis of data. obtained in this manner. 
From this time the problem has been examined in terms of tolerance 
distribution theory and developed in relation to probit analysis (Finney, 
1952). Plackett and Hewlett (1951) have extended the tolerance dis- 
tribution theory to different theoretical forms of joint action and 
developed a set of mathematical models, each of which is based on 
many assumptions and is very difficult to fit to experimental data. 

Fisher (1954) has shown that parameters of the binomial distribution 
may be estimated without tolerance distribution assumptions. The 
aim of the present paper is to show that the study of joint action by 
means of an appropriate experimental design—the simplex design— 
allows ready interpretation of experimental data with no reference to 
a joint tolerance distribution, and no further assumptions than normally 
required in quantal analysis. The method is also appropriate without 
modification to the study of joint action of substances eliciting a graded 
response simply by applying the standard estimation procedures. 

Examples will be drawn from the study of the action of oestrogens 
on the vagina of the ovariectomized mouse. The quantal response 
in this case is cornification of the vaginal epithelium. 


MATHEMATICAL METHODS 
The simplex design. 


Suppose A, , (j = 1, 2, ---, k) are the doses of k hormones which, 
when administered separately, elicit approximately the same percentage 
response. A joint dose, D, may be defined in terms of k coordinates, 
X; , which take positive values, thus, 


D= A;X; ’ (1) 
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where the coordinate values are restricted by, 


> xX; =1. (2) 


The experimental region is therefore restricted by the above to a 
(k — 1) dimensional simplex with vertices at the points on the co- 
ordinates X; = 1. This method of approach allows all different types 
of joint doses to be uniquely specified. Thus if X; = 1, the 7th hormone 
is administered separately. If X; + X; = 1, then some mixture of 
the ith and jth hormones are administered together. In both the 
study of experimental designs in this region and the analysis of ex- 
perimental data it is essential that a (k — 1) dimensional coordinate 
system be introduced to the simplex. This may be done in two stages, 
(1) shift the origin of the X system to the centroid of the simplex, i.e. 
the point where every coordinate has the value 1/k, (2) rotate the axes 
so that (say) the kth is orthogonal to the simplex. 

The first is accomplished by the simple transformation (3) which 
at the same time changes the scale of measurement so that the vertices 
have non-fractional coordinates in the new system, X. 


= — 1/k) = kX, 1, (3) 


where / is a vector of length k all elements of which are unity. 

The second stage is carried out by an orthogonal transformation 
(4) of rank k with matrix, ®@. The scale is also modified so the vertex- 
centroid distance becomes (k — 1) units. 


X; = k’-X,-® (4) 
where k’ = k(k — 1)-/k is a scale factor. 

—1 0 0 . 0 s| 

-1 (k-2)l 0 0 s 

-1 (k — 3)m 0 s 

@ = 1/k(k — 1): -1 —l —m 0s 

—l —m 0 s 

-1 —l —m n 8 

—1 —m —n 8 


The additional letters in this matrix are determined from the fact that 
the sum of the squares of the elements of each column is k(k — 1). 
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After this transformation all points of the simplex take the value 
0 for the coordinate X, which may therefore be ignored or used to 
describe another experimental variable, say different equivalent levels 
of dose. An experimental design consists of N points of the experimental 
region and may be summarized in a matrix called the design matriz, 
Box & Wilson (1951). The N rows of this matrix give the values of 
the coordinates at each of the N experimental points. In the present 
case the design matrix is of order N by (k — 1). 


An example when k = 2. 
In this case equations 1 and 2 become ; 
D = X\A,+ X24, , X,+ X, = 1, 0<X,, X, <1 (5) 


The experimental region is a line. For illustrative purposes a design 
matrix consisting of 5 experimental points, including the vertices, the 
centroid and two intermediate points will be transformed using the 
appropriate forms of equations 3 and 4. 


i =} 3 0 
2 211 0 0 0 
2 0 


[X, Xo] = 3[X, Xe]: 


Transformation to a log dose scale. 


Often in biological work response is linearly related to log dose. 
In studies on joint action it is of interest to test if this relationship 
still holds with respect to log joint dose. Since equation 1 is not linear 
in log joint dose a series of transformations are made so that this equa- 
tion holds for the logarithms of the equivalent doses and the log joint 
dose in terms of different coordinates. These transformations are all 
based on the simple case k = 2. 

Suppose equation 5 be written, 


(6) 
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This equation may be written, 


if log d = q log a, + (1 — q) loga , (7) 
p=(r'*-n/i-n, where r= a;/aq,. (8) 


Some values of this transformation are given in Table 1. Equation 
7 may be put in the form of equation 5 by simple definition of terms, 
i.e. D = log d, A, = log a, and so on. 


TABLE 1 
Table of the transformation, 
p= (r —r-9)/(r — 1), 
for equidistant sets of values of g. Geometric intervals of r are tabulated since the 
changes in p are more linear with this scale. (Figures in table are all < 10,000). 


q 2 4 1472! 8 16 32 | 64 | 128 


1/2 | 5432 | 5858 | 6271 | 6667 | 7040 | 7388 | 7708 | 8000 | 8498 |8889/9188 
1/3 | 3725 | 4126 | 4531 | 4934 | 5330 | 5714 | 6083 | 6434 | 7071 |7619/8079 
2/3 | 7044 | 7401 | 7735 | 8042 | 8321 | 8571 | 8793 | 8987 | 9298 |9524/9682 
1/4 | 2834 | 3182 | 3541 | 3905 | 4271 | 4633 | 4988 | 5333 | 5983 |6567|7082 
3/4 | 7815 | 8108 | 8377 | 8619 | 8836 | 9026 | 9191 | 9333 | 9555 |9710/9865 
1/5 | 2286 | 2589 | 2904 | 3229 | 3558 | 3889 | 4217 | 4540 | 5161 |5737/6260 
2/5 | 4420 | 4843 | 5263 | 5675 | 6074 | 6454 | 6813 | 7148 | 7742 |8234/8632 
3/5 | 6410 | 6805 | 7180 | 7530 | 7853 | 8147 | 8411 | 8646 | 9032 |9321/9530 
4/5 | 8267 | 8513 | 8736 | 8935 | 9111 | 9263 | 9395 | 9506 | 9677 |9794|9871 
1/6 | 1916 | 2182 | 2461 | 2751 | 3047 | 3347 | 3648 | 3947 | 4529 |5079)/5589 
5/6 | 8564 | 8775 | 8965 | 9134 | 9281 | 9408 | 9517 | 9608 | 9748 |9841/9902 
1/10 | 1163 | 1339 | 1528 | 1726 | 1933 | 2146 | 2363 | 2583 | 3023 |3457|/3875 
3/10 | 3372 | 3755 | 4145 | 4537 | 4925 | 5304 | 5672 | 6024 | 6673 |7241/7728 
7/10 | 7354 | 7689 | 7998 | 8281 | 8536 | 8763 | 8962 | 9135 | 9410 |9606)9741 
9/10 | 9149 | 9282 | 9401 | 9514 | 9594 | 9670 | 9734 | 9787 | 9866 |9918)9951 


In more general cases the joint dose may be looked on as a series 
of equations 6. For example if k = 3 equation 1 may be written, 


d = p{p’a, + (1 — p’)a2} + (1 — pas 


where 0 S p, p’ S$ 1. The quantity in braces may be regarded as a 
quantity, say b, and two transformations of the form of 8 made. 
Thus 


1—@’ 1- 


(9) 
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where 
n= a/a 


= — 12)/(1 — r2), = a;/b. 


Equation 9, on taking logarithms and using obvious definitions may be 
written, 


D= X,A, + + X3A; X, + X2 + = 1. (10) 


In this form response may be related to log joint dose and its com- 
ponents in a simple manner. When equivalent doses are equal, log- 
arithmic transformations need not be made, since in this case log joint 
dose is unaffected by variations in the coordinates subject. to the re- 
striction 2. 


Extension of the experimental region. 


Different equivalent levels of dose may be chosen for study. The 
method by which this is carried out depends on the form of the re- 
lationship of response to dose. In the case where this relationship is 
loglinear it is convenient to define each equivalent dose (A;) as a 
function of an exponent (n) in terms of constants. 


A; = (11) 


The values of the constants chosen depend on the Median effective 
dose (M.E.D.) and slope of the jth dose response line. Substituting 
the logarithm of these equations in equation 10 yields a. function 
linear in n if the X’s are held constant and linear in the X’s if n is held 
constant. It is also useful to choose the levels of the constants so that 
the values of A; chosen for study correspond to a set of equally spaced 
symmetric values of n centered at zero. 

Other experimental variables may be introduced into the design in 
a factorial or other manner. In practice, however, if many points in 
the simplex are chosen for study this will lead to very large numbers of 
treatment combinations. 


Analysis of variance. 


Suppose a mixed level factorial experiment consists of the combina- 
tions of three factors denoted S, L and A at s, 1 and a levels respectively. 
The factor S is somewhat unusual and consists of s points of the simplex 
design, the factor Z of | different levels of equivalent dose and A an 
additional factor at a levels. The complete design matrix therefore 
has N = s-l-a rows and (k — 1) + 2 columns where k is the number 
of substances entering the simplex design. For each factor an orthogonal 
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set of comparisons including the identity may be drawn up and tabulated 
as an orthogonal matrix, or as the product of an orthogonal matrix and 
a diagonal matrix to preserve round numbers. Suppose these matrices 
be arranged so the columns present comparisons, the first column 
consisting only of unity i.e. the identity. Successive columns or com- 
parisons may be numbered S°, S', S’, --- , where the superscript 0 
denotes the identity and the other superscript has a currency of at 
most the number of degrees of freedom of the factor levels under con- 
sideration. The full set of orthogonal comparisons appropriate to the 
N treatment combinations may be obtained by the direct product (see 
Tocher, 1952 for a definition) of these matrices followed by an appropri- 
ate permutation of the columns. Since in general only main effects 
and first order interactions are required, other degrees of freedom 
going into an estimate or error or being isolated (see Fisher, 1951) 
only part of this product need be carried out. Main effect degree of 
freedom comparisons are obtained by the direct product of the column 
under consideration with all other identity columns. First order 
interaction comparisons are obtained by the direct product of the two 
individual main effect comparisons under consideration with the re- 
maining identities. The matrix resulting from these direct products 
will consist of the first columns of an orthogonal matrix or the product 
of an orthogonal matrix with a diagonal matrix since the direct. product 
of orthogonal matrices is orthogonal. The sum of squares attributable 
to the individual comparisons may be determined in the standard 
manner. For a binomial variable the appropriate procedures have been 
described by Claringbold, Biggers and Emmens (1953). 

When k = 2, several sets of orthogonal comparisons have been 
determined for the purpose of detecting departures from linearity of 
response on dose. These are given for three cases, namely where one, 


two and three points are equally spaced on the line joining the two 
vertices. 


Name 

-1 0 1 -1 00 1 -1 0001 
s” 1-2 1 -1 1 1-1 1 0-2 0 1 
s* 0-1 1 0 2-3 2-3 2 
0-1 0 1 0 


The first row (since the matrices have been transposed for con- 
venience) is the identity. The second tests whether equivalent doses 
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were given. The third tests whether the mid-point response(s) falls 
on the line joining the control responses. The last comparison in each 
case is determined by those already made. An example of the use of 
these coefficients is given by Claringbold and Biggers (1955). 

Sets of comparisons may be determined for other cases where k is 
greater than two and when symmetric arrays of points in the simplex 
have been chosen. 


Regression analysis. 


A response transformate may be directly related to functions of 
the coordinates of the design matrix by a weighted regression analysis. 
The information matrix in this case is not diagonal since the sums of 
squares and cross-products of the coordinates of the design matrix are 
not in general independent. 


EXAMPLE 


The data are summarised in Table 2 together with the coordinates 
of the experimental design. The plan of the simplex design used in 
this experiment is shown in Fig. 1. The complete design is in the form 


FIG, 1. 


Plan of the two-dimensional simplex design used in the example. Points A, D and G correspond to the 
administration of oestrone. oestradiol-3:176 and oestriol alone, respectively. Points on the lines joining 
these vertices correspond to the administration of two oestrogen mixtures, while points within the tri- 
angle correspond to mixtures of three oestrogens. 
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of two equilateral triangular prisms, one for each replicate. Each 
prism has experimental points on three equidistant triangular planes. 
Since equal doses of oestrone, oestradiol-3:178 and oestriol are ap- 
proximately equivalent in their effect on response when administered 
intravaginally no logarithmic transformations are used. The empirical 


angular response (Y) is related to ten functions of the coordinates of 


Percentage response of groups of 12 ovariectomized mice to joint intravaginal admin- 
istration of oestrone, oestradiol and oestriol. The equivalent doses of these oestrogens 


TABLE 2 


denoted A; , Az, and A; were chosen so that 


A, = A; = As = 0.75 X 10-‘ug. © when X, = -1 
= 1.50 X 10-‘ug. when X, = 0O 
= 3.00 X 10‘ug. when X; = 1. 
Coordinates 
Original coordinates Point in simplex Response 
x; X2 xs Xe =-1 X,=0 
First replicate—Xp -1 
1 0 0 A 2 0 17 42 83 
2/3 1/3 0 B 1 t/3 0 33 75 
1/3 2/3 0 Cc 0 2t/3 33 33 75 
0 1 0 D -1 t 58 58 100 
0 2/3 1/3 -E -1 t/3 17 33 67 
0 1/3 2/3 F —1 —t/3 33 33 58 
0 0 1 G -1 —t 25 50 42 
1/3 0 2/3 H 0 —2t/3 25 42 42 
2/3 0 1/3 Zz 1 —t/3 0 25 75 
1/3 1/3 1/3 J 0 0 17 25 58 
Second replicate—Xp = 1 
1 0 0 A 2 0 42 50 75 
1/2 1/2 0 K 1/2 t/2 17 33 83 
0 1 0 D —1 t 75 67 83 
0 1/2 1/2 L —1 0 33 42 67 
0 0 1 G -1 —t 50 42 100 
1/2 0 1/2 M 1/2 —t/2 17 42 58 
2/3 1/6 1/6 N 1 0 33 33 58 
1/6 2/3 1/6 O —1/2 t/2 50 50 58 
1/6 1/6 2/3 P —1/2 —t/2 33 33 50 
1/3 1/3 1/3 J 0 0 17 42 42 


where t = 1/3 
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the design, by the following regression equation, 
Y = Bo + + +- + + 
+ + + BuXi + 


The information matrix was determined for these ten parameters and 
was inverted to give the variance-covariance matrix (Table 3). The 


TABLE 3 
Variance-covariance matrix for the experimental data and design given in Table 2. The theoretical 


variance used in its formation is that tabulated by Claringbold, Biggers and Emmens (1953) for the 
empirical angular transformation. 


-0.106 1.261 . 0.056 0.056 
3.182, 


—0.605 
1.727 


1.727 
—0.605 


—1.473 
1.473 


Xo Xe X: Xt ¥iXs Xz 


matrix inversion was carried out using the method of Fox (1950) and 
Fox and Hayes (1952). In Table 4 the estimates of regression co- 


TABLE 4 


Regression analysis of the data of Table 2 following the 
empirical angular transformation. 


Regression 
coefficient 


Least square 
estimate 


Bo 35.39 


Br 2.61 + 1.12 2.3 0.02 >P >0.01 
Bi 1.93 + 1.78 | 0.3 >P>0.2 
Be 2.84 + 1.69 ef 0.1 >P>0.05 
Br 12.30 + 1.37 9.0 P <0.001 
Biz —0.90 + 1.96 0.5 0.7 >P>0.6 
Bit 3.12 + 1.41 2.2 0.05 > P > 0.02 
Bor 0.54 + 1.41 0.4 0.7 >P>0.6 
Bu 3.20 + 1.32 2.4 0.02 >P >0.01 
Bro 4.21 + 1.32 3.2 0.01 > P > 0.001 


Deviations from regression: = 49.7, 0.7 > P > 0.5. 
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efficients are tabulated together with their standard errors and test of 
significance. Both estimates of regression on the quadratic functions 
of the simplex coordinates are significantly positive. This indicates 
that the response to mixtures becomes smaller as the centroid of the 
simplex, which corresponds to a 1/3: 1/3: 1/3 mixture of the three 
oestrogens, is approached, and shows that the oestrogens have a mutually 


antagonistic action. The physiological significance of these findings 
is discussed by Claringbold (1955). 


DISCUSSION 


The simplex design in itself is a non-factorial design and may be 
criticised on these grounds. Factorial experiments in joint action 
studies lead to complex response surfaces even if one drug behaves 
simply as a dilution of the other (i.e. similar action, see Finney, 1952). 
Suppose a factorial experiment is designed for two factors (A, A’) each 
at three levels. Suppose as a theoretical example both factors are 
simply doses of the one hormone, i.e., similar action must hold, and 


also suppose that response is linearly related to log dose. A possible 
design could be : — 


Dose of A 
(units) 


Log, total dose 
Dose of A’ 1.00 1.50 2.32 
(units) 2 3 4 6 1.59 2.00 2.58 
4 5 6 8 2.32 2.58 3.00 


The total dose administered to each animal in the nine groups of animals 
is shown in the body of the table, while the log total dose is shown as a 
subsidiary block of mixtures in one-one correspondence to the first 
block. If response is linear to log dose it must be proportional to these 
elements apart from some constant. Thus in the simplest case a curved 
response surface must be evaluated. Also if treatments consisting of 
one substance or control treatments are included they create difficulties 
since the log of zero is —~. The data must be analysed, therefore, 
in a number of disconnected steps. Plackett and Hewlett (1951) use 
this method and their analysis takes the following form: 


1. Fit one substance dose response lines. 
2. Predict on basis of alternative models the response to joint doses. 
3. Choose the hypothesis which describes the observed data best. 
Using the method described in this paper in the theoretical example, 
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the design could be: 


Total dose (units) 


a 1 2 4 
Dose of Dose of Dose of Dose of Dose of Dose of 
A A’ A A’ A A’ 
0 0 1 0 2 0 4 
1/3 1/3 2/3 2/3 4/3 4/3 8/3 
2/3 2/3 1/3 4/3 2/3 8/3 4/3 
1 1 0 2 0 4 0 


At each level of total dose administered four different methods of 
dividing the dose are shown. In the present example these must give 
equal responses and the response surface is easy to describe. Thus 
the present method has the advantages: 

1. The one-substance treatments are simply special cases of the 
definition of a joint dose. 

2. The data may be efficiently analysed in one step. 

3. Similar action is indicated (in general) by no significant de- 
partures from linearity of response to log dose. 

Finney (1952) uses an equation similar to equation 8 of this paper 
in the study of joint action. Instead of defining the transformation in 
terms of the actual doses administered it is defined in terms of relative 
potency, which is subject to estimation. If exactly equivalent doses 
were administered the transformation used here would be equivalent 
to that of Finney. Definition in terms of relative potency immediately 
restricts the study to the joint action of substances which give parallel 
dose response lines. Using the methods of this paper, series of ap- 
proximately equivalent doses may be defined by appropriate geometric 
progressions and without any reference to relative potency. Claringbold 
and Biggers (1955) give an example where the joint administration of 
oestrone by two routes is studied. Here the slopes of the separate dose 
response lines are very different but the present method allowed ready 
interpretation of the response surface. In the present work although 
the mathematical equations do not demand equivalence of the doses 
administered this is desirable since the interval of linear relation of 
response to log dose is usually restricted. Thus although small de- 
partures from equivalence will not invalidate the present method large 
departures will lead to response surfaces difficult to evaluate. 

I wish to thank Professor C. W. Emmens for advice during the course 
of this study. 
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STATISTICAL ANALYSIS OF MULTIPLE SLOPE 
RATIO ASSAYS 


C. G. BaRRACLOUGH 


Kraft Foods Limited, Melbourne, Australia 


1. Introduction. 


The statistical analysis of slope ratio assays for one test solution 
has been described in detail by Finney (1952), and routine methods 
of computation fully outlined, including tests for statistical and funda- 
mental validity. Clarke (1952) has given a method for assays involving 
any number of test preparations, and this paper describes an adaptation 
of Clark’s procedure using response totals directly to compute the 
various slopes, and gives a further analysis of the sum of squares associa- 
ated with the test for fundamental validity. The method of analysis 
given here applies only to assays in which the response for each prepara- 
tion is a linear function of the dose. The assay design must be com- 
pletely symmetrical, i.e. there must be equal spacing between the dose 
levels for each preparation, the same number of dose levels for each 
preparation, and equal replication for all treatments. It is generally 
preferable to run a test at the zero dose level since this gives improved 
tests for validity, Finney (1952), Wood and Finney (1946); but the 
suggested method is developed to cover assays with, and without, 
tests at the zero dose level. 


2. Notation. 


An assay with a test at the zero dose level is termed an (rk + 1) 
.assay, while an assay without a zero dose level test is termed an (rk) 
assay. 

Let z;; = working dose, where i = 1, 2, 3, --- r, is the preparation 
and 


j = 0,1, 2, --- k is the dose level for an (rk + 1) assay 


j= 1,2, kis the dose level for an (rk) assay 


The working scales are chosen so that the highest dose of ~~ 
preparation is taken as unity, i.e. x;, assumes the values 0, 1 L/ k, 2/k,- 
for each preparation in an (rk + 1) assay. 


1, 
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If Ri = potency estimate of the ith preparation on the working 
scales and 


X, = highest dose of standard preparation (taken as preparation 1) 


X,r, = highest dose of ith preparation (X, is usually but not necessarily 
the same for each test preparation) 


then 


R; = Ri 


where RF; is the true potency estimate. 
Let 7; = response total for n replications of the dose x,; , then 


(1) H,; = (2k 2)T + (2k i2 


+ (2k 8)T is (k 


where H/; is termed the intersection value for the 7th preparation. The 
intersection value H; is equal to the expected zero dose response total, 
multiplied by [k(k — 1)]/2, for the 7th preparation, as estimated by a 
straight line fitted through the non zero dose response totals for that 
preparation. 

The following symbols have been used for sums of squares and 
products for the doses and responses, to shorten the formulae. 


t=1 j=0 


(2) 


k 


t=1 


(3) 


ll 


r k 


i=1 j=0 


k 
Visi 


where -&, = Gk +) for an (rk + 1) assay 


and = for an rk assay 
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In the tests for validity orthogonal contrasts have been designated 


ll 


contrast associated with the test for “blanks” 


contrasts associated with the tests for “intersection”? where 
s runs from 1 to (r — 1). 


3. General Formulae. 


The formal procedure for either the (rk + 1) or (rk) type of assay 
involves the fitting of a multiple linear regression equation of the form 


Y =a+t bx, + dot, + --- + 0,2, 
where Y is the estimated mean response to the doses x, , 22 , etc., of 
the various preparations, while b, , b. , etc. are the estimated increases 
in response per unit increase in dose of the corresponding preparations. 
The potency estimate 2} on the working scales is given by 


Ri = b 
This is equivalent to fitting separate straight lines through the responses 
for each preparation, with the restriction that they all intersect at the 
zero dose level, and then obtaining the potency estimates from the 
ratios of the slopes of the lines for the test preparations to the slope of 
the line for the standard preparation. The formal method estimates 
the b; values from the following type of equation ; 


where the v;; are the elements of the variance and covariance matrix. 
From the expression (4) for S;,7 it can be seen that each term in (5) 
involves every 7';; , i.e. b; can be expressed as a linear function of the 
values. 
r k 
(6) = Minis T 
i=1 7=0 
The values of the coefficients in (6) can be determined from (4) and 


(5) if the elements of the inverse matrix can be obtained in a con- 
venient form. If we write 


a by 
| 
| 
| 
| 
a 
Si Sie Si2 
Aa Sis Siu Si. 
Sis Sie Si 
° 
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where A is a square matrix of order r, then the variance and covariance 
matrix is given by 


le d d 
d ed 
dd 
where 
M = : 
(Su — S12)(Sir + (r — 1)Si2) 
c= S8,+ (r 2)Si2 
a= — Sy. 
Since 
+ 1)(2k +1) n(k + 1)° 
6k A(rk + 1) 
and 
n(k + 1)? 


Sin = ~ A(rk + 1) 


for an (rk + 1) assay, where n is the number of replications, the values 
of M, c, d, can be readily calculated for any values of k and r. Table 
4 contains values of M, c,d, forr = 2 to 10, and k = 2,3 for an (rk + 1) 
assay. hey are used directly to determine the fiducial limits for R; 
as discussed in the numerical example, and have been used in deriving 
the values of the multipliers. Table 5 contains the values for an (rk) 
assay obtained in the same way except that (rk + 1) is replaced by (rk) 
in the expressions for S,, and S,, . 

The values of M, c, d, could be combined with expression (5) to 
calculate the multipliers; but the symmetrical nature of the design 
permits (6) to be expressed more easy as 


k k r 
(7) b; = + mT; + Pi Ti; 

T, is used for the total response for the replications at the zero 
dose level since it is common to all preparations. The general form 
of (7) is the same for the two types of assay except that there is no term 
involving 7, for an (rk) assay; but the multipliers have different values 
for the two types of assay. Values of the multipliers for an (rk + 1) 
assay with r = 2 to 10, and k = 2, 3, are given in Table 6, while the 
multipliers for an (rk) assay are given in Table 7. A common factor 
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has been removed from each set of multipliers and is given in a separate 
column. 

In routine assay work it is usually unnecessary to calculate the sum 
of squares due to regression since the regression will always be signifi- 
cant; but it is essential that validity tests be carried out on every assay. 
The most suitable tests are those for statistical and fundamental 
validity, generally referred to as ‘blanks’ and ‘intersections’ and dis- 
cussed by Finney (1951) (1952), for the case of two preparations. 

For r preparations the sum of squares in an analysis of variance 
associated with the test for blanks still has one degree of freedom but 
that for intersections has (r — 1) degrees of freedom. The sum of 
squares for intersections can be further divided into (r — 1) orthogonal 
contrasts, each with 1 degree of freedom, and these contrasts can be 
associated with specific tests among the intersection values for the 
various preparations. In a composite test for intersections it is possible 
that a significant result will be obtained, leading to the rejection of the 
whole analysis, when actually only one preparation is at fault. A suitable 
subdivision of the sum of squares for intersection would permit the 
isolation of the effect due to the faulty preparation, and if the remainder 
of the sum of squares for intersection was not significant the results of 
the assay could be recomputed to obtain valid results, after omitting 
the results for the faulty preparation. It is possible, though not very 
likely, that the composite sum of squares for intersection could give a 
non significant test, although one of the components would be significant 
if the subdivision was carried out. For (rk) assays there is no test for 
blanks; but the tests for intersections and the subdivision of the sum 
of squares for intersection can be carried out in the same manner as 
for an (rk + 1) assay. 

The r sums of squares are most conveniently obtained by using a 
table of orthogonal coefficients of the following form, in conjunction 
with the H values obtained from (4). 

The orthogonality of the contrasts can be most clearly seen from 
the table but general formulae can be given for the contrasts. 


(8) 


2 i=1 
Divisor =< 27H + 1) + rk(k — 
4 
(9) L;, = ¢ — 8)H; — > H; s runs from 1 to (r — 1) 


t=s t=s+1 


Divisor = 


nk(k — 1)(2k + 1\(r — s + 1)(r — 8) 


Py 4 
| 
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: I- I- (¢ — 4) 0 0 0 “-_? 

(¢ — — + — 
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The sum of squares associated with each degree of freedom = 
L’/divisor 

If k > 3 the deviations from linearity for the non zero doses can be 
calculated separately for each preparation using standard orthogonal 
coefficients. For (rk) assays it is essential to have k > 3 so that tests 
for deviations from linearity for the non zero doses can be carried out, 
since there is no test for blanks. If the blanks component in an (rk + 1) 
assay is significant; but the intersections component not significant, it 
would be possible to discard the zero dose figures and analyse the re- 
maining data as an (rk) assay, provided k > 3. It is suggested that 
k = 3 is the best general purpose design since it has been shown by 
Finney (1952), that the efficiency of a slope ratio assay falls rapidly 
with increasing values of k. 


4. Numerical Example. 


The data used in the example are taken from the results of an assay 
of niacin in yeast extracts. Five preparations were used, one standard 
and four test, each at three levels, and a zero dose level test was in- 
cluded. There were two replications, giving sixteen degrees of freedom 
for the error estimate. The assay is based on the measurement of the 
acidity produced by a culture of Lactobacillus arabinosus, Barton 
Wright (1952), on a medium to which niacin has been added. 

The figures given in Table 1 are the titres in mls. of N/10 sodium 
hydroxide for each tube, while in Table 2 the duplicate measurements 
have been totalled, and set out in a form more suitable for the compu- 
tations. 


TABLE 1 

Preparation 1. Preparation 2. Preparation 3. 
x2g. 3.2, 3.5 1 ml. 4.2, 4.7 1 ml. 3.8, 4.4 
0.05 ug. 4.7, 4.8 2 mi. 5.0, 5.0 2 ml. 5.2, 5.4 
0.10 ug. 6.2, 6.3 3 ml. 6.1, 6.1 3 ml. 6.2, 6.6 

Preparation 4. Preparation 5. 
1 ml. 4.0, 4.0 1 ml. 4.2,4.3 
2 mi. 5.1, 5.6 2 mi. 5.2, 4.8 
3 ml. 6.0, 6.1 3 ml. 6.1, 6.3 
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TABLE 2 
Prepn. 1 | Prepn. 2 | Prepn. 3 | Prepn. 4| Prepn. 5| Totals 
Standard 
Zero Level 6.7 6.7 
Level 1 9.5 8.9 8.2 8.0 8.6 43.2 
Level 2 12.5 10.0 10.6 10.7 10.0 53.8 
Level 3 15.2 | 12.2 | 128 | 121 | 12.4 | 64.7 
bf 897.6 576.4 627.0 580.8 583.0 
Rt 0.6422 
0.03211 
Hi; 20.1 21.2 17.8 18.5 19.6 97.2 


The multipliers necessary to calculate the slopes are obtained from 
Table 6, for r = 5, k = 3. There is no need to use the common factor 
given in Table 6, since we are only interested in the ratio of the slopes. 
For this reason the slopes are denoted by b/ instead of b; . 


bf = —42T, 44T + 66715 
5 5 5 
— 24 6 t+ 12 Vis 
t=1 


= —(42 X 6.7) + 22[9.5 + (2 X 12.5) + (3 X 15.2)] 
— 6[(4 X 43.2) + 58.8 — (2 X 64.7)] 
= 1762.2 — 864.6 = 897.6 
bs = 22[8.9 + (2 X 10.0) + (3 X 12.2)] — 864.6 = 576.4 
576.4 0.6422 Xs = 0.15 ug. niacin 
X, = 3 mls. test solution 


= 0.6422 = (0.03211 ug. niacin per ml. test solution. 


The values of H; are obtained from (4) as 
= 4T + T 2T 
Thus H, = (4 X 9.5) + 12.5 — (2 X 15.2) = 20.1 


A complete analysis of variance for the data is given in Table 3 
but in routine assay work it is not necessary to compute the whole 
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TABLE 3 
Analysis of Variance. 


Degrees of Sums of Mean 

Source of Variance Freedom Squares Squares 
Between treatments 15 37.0650 2.4710 
Regression 5 36.5429 7.3086 
Lz 1 0.0165 0.0165 

Li, 1 0.0130 0.0130 

Ly, 1 0.1176 0.1176 

Ly, 1 0.0248 0.0248 

L;, 1 0.0144 0.0144 

Q 1 0.0075 0.0075 

Q: 1 0.1008 0.1008 

Q3 1 0.0033 0.0033 

Q% 1 0.1408 0.1408 

Qs 1 0.0833 0.0833 

Within treatments 16 0.7100 0.0444 

Total 31 37.7750 


analysis. The essential parts are the error estimate obtained from the 
sum of squares within treatments, and the individual sums of squares 
for blanks, intersections, and deviations from linearity if k > 3. Using 


(8) 


5 
Ls = 15T, — > H; = 100.5 — 97.2 = 3.3 


t=1 


Divisor = 660 
. Sum of squares for blanks 


Using (9) we obtain 


5 
L,, = 4H, — H, = 804 77.1 =33 


Divisor = 840 
.. Sum of squares for the one degree of freedom corresponding to L,, 


83)" 
840 


= 0.0130 


{ é 
| 
(3.3)? 
= = 0.0165 
660 
i=2 
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Elements of inverse matrix for (rk + 1) assay. 


TABLE 4 
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k 


M 


c 
2 2 se 16 
3 2 — 17 
4 2 os 18 
5 2 a 19 
6 2 o 20 
7 2 21 
8 2 = 22 
9 2 se 23 

10 2 = 24 
2 3 
3 3 
4 3 
5 3 
6 3 
7 3 
8 3 
9 3 

10 3 


| 
= 
i 
d 
9 j 
9 
9 
9 
9 A 
9 
9 
9 
18 
18 
18 
18 
18 
18 
18 
18 
nm 
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TABLE 5 
Elements of inverse matrix for (rk) assay. 
r k . M c d 
2 2 = 11 9 
3 2 = 12 9 
4 2 13 9 
5 2 = 14 9 
6 2 * 9 
7 2 = 16 9 
8 2 a 17 9 
9 2 = 18 9 
10 2 = 19 9 
2 3 ~~ 8 6 
3 3 = 9 6 
4 3 = 10 6 
5 3 a 11 6 
6 3 a 12 6 
7 3 ae 13 6 
8 3 7“ 14 6 
9 3 oa 15 6 
10 3 140n 16 6 
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L,;, is a test of the average intersection value for the test solutions 
against the intersection value for the standard preparation. The other 
L,; contrasts are comparisons between the intersection values for the 
various test solutions. Since k = 3, only quadratic components of 
deviations from linearity for the non zero dose levels will exist. 
Thus for Preparation I 


—_ _ 
g, = = 0.0075 


None of the mean squares for testing validity is significant so the 
data are consistent with the multiple regression equation, and the 
potency estimates are valid. 


TABLE 6 
Multipliers for an (rk + 1) assay. 
Common 
k r mo m Factor 
2 2 —15 7 14 —6 3 2 
35n 
2 3 —15 8 16 —6 3 -_ 
40n 
2 4 —15 9 18 —6 3 7s 
45n 
2 5 —15 10 20 —6 3 = 
50n 
2 6 —15 11 22 —6 3 =. 
55n 
2 7 —15 12 24 —6 3 m2 
60n 
2 8 —15 13 26 —6 3 ie 
65n 
2 9 —15 14 28 —6 3 a 
70n 
2 10 —15 15 30 —6 3 4 
75n 
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TABLE 6—Concluded 
Multipliers for an (rk + 1) assay. 


Common 
k r mo me ms Ps Factor 
2 42 13 26 39 —24|] —6 12 : 
182n 
3 3 —42 16 32 48 —24|] —6 12 : 
224n 
3 4 —42 19 38 57 —24| —6 12 : 
266n 
3 5 —42 22 44 66 —24|) —6 12 : 
308n 
3 6 —42 25 50 75 —6 12 
350n 
3 7 — 42 28 56 84 —24| —6 i2 : 
392n 
3 8 —42 31 62 93 —24) —6 12 
434n 
3 9 — 42 34 68 102 —24| —6 12 - 
476n 
3 10 —42 37 74 111 —24 | —6 12 : 
518n 


The fiducial limits for the various R values can be readily obtained 
using the approximate formula for the variance of R. 
(10) 
s’ is the error mean square, while M, c, d are the appropriate values for 
the inverse matrix obtained from Table 4. 


The approximate formula can be used provided g is less than 0.05 
where 


2,2 
= and t is Student’s ¢. 
1 


In most microbiological assays this condition will be satisfied; but if 
it is not satisfied the exact limits can be obtained from Fieller’s formula, 
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(Fieller 1944). The expression (10) can be slightly simplified by using 
R’ instead of R. 

2 

M 
(11) V(R) = + — 2dR’] 

It is important to notice that the value of b, used in (10), (11) must be the 
correct one, not the bj given in Table 2. 
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TABLE: 7 
Multipliers for an (rk) assay. 
Common 
k r m me Pu po Factor 
2 
2 2 2 + i 3 10n 
2 
2 3 3 6 —6 3 ibn 
2 
2 4 4 8 =—6 3 20n 
2 
2 
5 5 10 6 3 25n 
2 6 6 12 —6 3 “ae 
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2 
2 8 8 16 —@ 3 40n 
2 
2 9 9 18 —=6 3 45n 
2 10 10 20 —6 3 oe 
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TABLE 7—Concluded 
Multipliers for an (rk) assay. 


Common 
k r m me ms pr D2 Ds Factor 
3 2 2 4 6 —8 —2 4  * 
28n 
3 3 3 6 9 —§ —2 4 ft. 
42n 
3 4 4 8 12 —8 —2 4 Ss 
56n 
3 5 5 10 15 —8§8 —2 4 ae 
70n 
3 6 6 12 18 —8 —2 4 a 
84n 
3 7 7 14 21 —8 —2 4 .% 
98n 
3 8 8 16 24 —2 4 
112n 
3 9 9 18 27 —8 —2 4 : 
126n 
3 10 10 20 30 —8 —2 4 140n 
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AN ANALYSIS OF PERENNIAL CROP DATA* 


R. G. D. Street 
Cornell University, Ithaca, N. Y. 


SUMMARY 


A bivariate analysis of variance is applied to perennial crop data. 
A test of an hypothesis about varietal effects is made. The bivariate 
analysis and a univariate analysis are compared. Two transformations 
of the data are considered. An expedient for locating varietal differences 
is proposed. 


1. Introduction 


In many fields of study, multiple observations are made on each 
individual. Treatment of such multivariate data differs. A trait-by- 
trait analysis may be made. Methods which consider several characters 
simultaneously include variance, covariance, components of variance 
and regression analysis. One of these may adequately answer the 
questions raised or test the hypotheses stated by the research worker 
when designing the experiment or survey. However, cases arise where 
none of these procedures is wholly adequate or appropriate. A multi- 
variate analysis may be both appropriate and adequate in such cases. 
The term multivariate analysis will be applied to analyses of data where 
several variables are considered jointly with none relegated to the 
position of an independent variable. 

If such multiple observations are analyzed on the basis of separate 
variables, the combination of the results of univariate tests and the 
assignment of a measure of credibility to any inference drawn present 
problems. Thus if the observations are perfectly correlated, the same 
conclusions are drawn from each variable; if the observations are com- 
pletely independent and it is agreed to claim a difference at the 5% level 
if at least one variable shows significance, then one falsely claims 
significance with probability 1 — (.95)" with n variables; if the rule is 
to claim a difference only if all variables show significance, then the 


*This paper is an expansion of a talk presented September 7, 1953, at a meeting in Madison, Wis- 
consin, of the American Institute of Biological Sciences. It is paper no. 312 of the Department of Plant 
Breeding and no. 17 of the Biometrics Unit. 
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probability of falsely claiming a difference is (.05)" for n variables and 
it becomes practically impossible ever to detect a difference. In either 
case, rules can be constructed and inferences made with valid measures 
of credibility, but the true situation is probably somewhere between 
complete dependence and complete independence and we simply do 
not know the level of significance. In a multivariate analysis, the 
problem of dependence is looked after by the criterion itself. 

It is the purpose of this paper to consider a multivariate analysis 
of yield for a forage crop where varieties are necessarily on the same 
plots in all years. The variables will be the total yields for each of 
two years. The tests of significance used are valid at the stated levels 
of significance whether or not year to year correlations exist or residual 
variances are homogeneous with respect to years. 

Picture a graph, each point being the pair of variety means for the 
two years. If these paired means lie fairly closely together in a circle 
or ellipse, intuition suggests that they be declared not significantly 
different; whereas if they appear to lie along a line, extended rather 
excessively, intuition suggests variety differences that persist from year 
to year. For a 45° line, there appears to be no interaction of years and 
varieties whereas a line of other than 45° suggests a multiplicative effect 
of years, a special type of interaction not generally detected as such in 
the usual analysis of variance with years and its interactions as sources 
of variation. An additive year effect, solely or additicnally, is indicated 
if the straight line is not through the origin. For any straight line, a 
single linear combination of the two years’ yields should discriminate 
among varieties. The case where the points are scattered widely with 
little or no apparent linear correlation suggests a variety by year inter- 
action other than the above special type. Discrimination here would 
require two linear functions. 

A multivariate analysis will formally contain the ideas of the previous 
paragraph. 


2. The Multi-variate Model 


For a randomized complete block experiment, denote the yield in 
the h-th year for the 7-th replicate and the j-th variety by 
Since tests of significance are planned, assume the ¢;;’’s have a joint 
normal distribution and for fixed h, are independently distributed with a 


common variance. Assumptions about the other additive components 
may be those of the usual models. 


ag 
. 
\ 
| 
es 
4g 
ts 


PERENNIAL CROP DATA 203 


3. The Data* and Analysis 


The data consist of the total plot-yields for each of 1949 and 1950 
of 25 varieties of alfalfa planted in 1948 in a randomized complete block 
experiment with 4 replicates. The paired treatment means and overall 
mean for each variety are given in Table 1. The computations required 


TABLE 1 
Treatment means for 25 varieties of alfalfa in tons/acre, 4 replicates 
Variety 1 2 3 4 5 6 

1949 3.23 2.92 3.58 3.40 3.54 2.74 
1950 4.47 4.25 4.15 4.52 4.66 3.80 
Mean 3.85 3.58 3.87 3.96 4.10 3.27 
7 8 9 10 11 12 13 
2.78 3.14 3.53 , 3.51 3.44 3.68 3.18 
4.10 3.76 4.55 4.58 4.02 4.86 4.26 
3.44 3.45 4.04 4.04 3.73 4.27 3.72 
14 15 16 17 18 19 20 
3.62 3.28 3.68 3.54 3.46 3.28 3.37 
4.06 4.07 3.71 4.44 4.26 3.84 3.91 
3.84 3.68 3.69 3.99 3.86 3.56 3.64 
21 22 23 24 25 Grand 
2.94 3.16 3.58 3.40 3.44 3.34 
4.04 3.87 4.52 3.86 3.83 4.18 
3.49 3.51 4.05 3.63 3.63 3.76 


initially are standard analyses of variance of the data for each year and 
the cross-products of an analysis of covariance. In Table 2, the 1949 
and 1950 analyses are in the top left and lower right corners respectively, 
and the cross-products in the so-called off-diagonal. This presentation 
calls attention to the two-dimensional nature of the analysis. 


*Data obtained through courtesy of C. C. Lowe, Department of Plant Breeding, Cornell. 
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Without reference to tests of significance, inferences must be 
qualitative. For such inferences, components of mean square (4, 
Tukey, 1949) may be helpful and are given. If one calculates correla- 
tion coefficients on the basis of components of mean squares, that for 
replicates is seen to be negative and greater than 1, whereas that for 
varieties is positive and of the order of .5. 


TABLE 2 
Bivariate analysis of variance 


Source | df. 8. S. M. 8. Component of M. S. 
4.2362 —1.4819 1.4121 —0.4940 .0503 — .0213 
Replicates | 3 
—1.4819 0.9074 —0.4940 0.3025 —.0213 .0049 
6.9634 3.1027 0.2901 0.1293 .0339 .0225 
Varieties 24 
3.1027 10.0438 0.1293 0.4185 .0225 , 0595 
11.1171 2.8394 6.1544 0.0394 
Residual 72 
2.8394 12.9941 0.0394 0.1805 
22.3167 4.4602 
Total 99 
4.4602 23.9453 


In order to assign a measure of uncertainty to an inference about 
varieties, let us test the null hypothesis that variety effects are zero 
for both y“’ and y®. The criterion is 


En Ex 
Ey, + Tu + Ti2 


Ex + Tx Ex. + T 22 


where E,; and 7';; are sums of squares and cross-products for residuals 
and treatments respectively. Vertical bars indicate determinants are 
to be taken. The analogy between U and‘F is given by Tukey (4). 

0 < U < 1 with values near one supporting the null hypothesis and 
values near zero indicating significant departures. The quantity VU 
has been shown by Wilks (5, 6) to have a beta-distribution, with param- 
eters p and q as used by Pearson (3) equal to (residual d.f. — 1) and 
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variety d.f. respectively. If it is desired to use F-tables, calculate 
fal 1- VU 
n 
with m = 2p and n = 2q corresponding to d.f. for denominator and 


numerator respectively. 
For this example 


11.1171 2.8394 


2.8394 = 357777 


18.0805 5.9421 


5.9421 23.0379 
with p = (72 — 1) and q = 24. Since 


142 .402 
VU = 598, F= a8 * “50g = 1-99 


with m and n = 142 and 48 respectively. Variety effects are judged 
highly significant. 

Significance raises the usual problems and the criterion U must be 
more closely considered. Reconsider this criterion in the determinantal 


equation 
Ex Ex + E., 
This equation has two roots whose joint distribution (6) is 
f(U, , U2) dU, dU, 
= — — U.))™~?? (U, — U.) dU, dU, 


where 1 > U, > U, > 0, K is a known constant, and n, and nz are 
treatment and residual d.f. respectively. From this, the distribution 
of each root can be obtained and tested for significance. 

The determinantal equation for the example is 


381.2282U* — 457.3105U + 136.3945 = 0. 
The roots are U, = .6441 and U, = .5554. Their joint distribution is 
f(U, , dU, dU, 
= K(1 — U,)77(1 — — U2) dU, dU, . 


It is seen that the exact distributions of U, and U, can be obtained. 
Tukey (4) works an example for an odd and even pair of df. (It 
appears that he used location rather than variety d.f.) 


= 0. 
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We use x’ approximations. For testing U, it has kn, d.f. and is 
= —[(m, + — +m + 1)) log. U 


where & is the number of variates. Thus 


x? = —[(24 + 72) — (2 + 24 + 1)] log, .3578 
= 90.95, df.= 2X 24 = 48. 
For testing U, , it has (k — 1) (n, — 1) df. and is 
= —[(r. + m2) — Hk +, + log. U, 
We obtain x” = 35.17, df. = 23. 


TABLE 3 
Root d.f. x 
U2 25 55.78** 
Ui 23 35.17* 
Total 48 90.95 


Note that U = U, X U, and that the smaller root, U, , is tested first. 
The complement of U; is the square of a multiple correlation coefficient, 
which has been maximized by the choice of a linear combination of the 
two observations. A second linear combination, uncorrelated with the 
first, is associated with the complement of U, . The new variates are 
canonical variates; the correlations are canonical correlations. They are 
further discussed in section 4. 

We interpret the x’ table as follows: the significant value of U 
indicates real varietal differences. This suggests we examine U, and 
U,. If U, were not significant, the significant U, would indicate that 
variety pairs did not depart significantly from a line, a space of one- 
dimension, and that varieties could be discriminated among by a single 
linear function of the paired yields. From the significant U, , we con- 
clude this is not the case; there appears to be variety X year interaction. 
The variety pairs fail to lie in a space of one dimension. 


4. Transformations 


In an analysis of variance with years as a source of variation, 
varieties and varieties X years mean squares are usually tested. These 
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involve a sum and a difference of the pairs of observations. Consider a 


linear transformation giving two new variables, multiples of the sum 
and difference, viz., 


-1/V2 v2 v2 


Multiplication is row-by-column. Note that the sum of the squares 
of the elements in each line of the transforming matrix add to unity 
and that the sum of the cross-products of the lines is zero. Such a 
matrix is said to be orthogonal. (If one had three years’ data on a 
perennial crop, an appropriate transformation might involve the sum 
and linear and quadratic effects. Thus, 


-1/V¥2 1/V6 
1/V6 


would be the transformation. The sum of squares of the elements in 


any line is unity and the sum of cross-products for any two lines is 
zero.) 


In the univariate case if the variable z has variance s°, then the 
variable ax has variance a’s’. Analogously in the bivariate case, if 
the covariance matrix of (y“”, y) is 


S21 S22 


then the covariance matrix of 
bd 


acs,,; + (ad + + C81, + + d’ 829 


( + 2abs,2 + acs,, + (ad + be)si2 + 
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Applying (2) to the covariance matrices of Table 2, we obtain 


TABLE 4 


diff 
Bivariate analysis of (= 


Source df. M. Sq. Components of M. Sq. 
.3633 . 5548 .0063 .0227 
Replicates 
.5548 1.3513 .0227 .0489 
.4836 —.0642 .0692 —.0128 
Varieties 24 
— .0642 . 2250 — .0128 .0242 
.2068 —.0130 
Residual 72 
— .0130 . 1280 


Components of mean square may be calculated directly from the new 
mean squares or by transforming the original components of mean 
square. The use of an orthogonal transformation matrix leaves un- 
changed the sum of the diagonal elements, or trace, of the! covariance 
matrix. This serves as a partial check on the numerical results. 


TABLE 5 
Source d.f. 8. S. M.S. Location 

Years 1 35.0870 35.0870 
Reps 3 1.0898 0.3633 | Reps, left upper 
Reps X Yrs 3 4.0538 1.3513 | Reps, right lower 
Varieties 24 11.6062 0.4836 | Vars, left upper 
Yrs X Vars 24 5.4010 0.2250 | Vars, right lower 
Reps X Vars 72 14.8952 0.2069 | Residual, left upper 
Residual 72 9.2160 0.1280 | Residual, right lower 

Total 199 81.3490 


The analysis of variance with years as a source of variation is given 
in Table 5. The column “Location” states where, in the bivariate 
analysis, the corresponding mean square is to be found. The mean 
square for years is available from the grand means of the bivariate 
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analysis as (4.18 — 3.34)’100/2 = 35.28, differing from that of the 
analysis of variance due to rounding of the means. 

It is to be noted that only the diagonal terms of the bivariate 
analysis are in the analysis of variance. The bivariate analysis con- 
tains additional covariance terms often worth detecting by the ex- 
perimenter. 

The roots of the determinantal equation (1) remain unchanged by 
the transformation. 

An alternate transformation suggested by the data themselves is 
available, viz. that which leads to canonical variates. The relation 
between the canonical variates first discussed by Hotelling (2) and 
those of this example is stated by Bartlett (1). For the first variate, 
the coefficients of y“ and y™ are found by solving the equations 
(3) +7, Ew t _ 

Ex Eo + Tx Tx To a2 
where Ri = 1 — U,. U-, was the smaller root of equation (1). The 
complement of each root is a canonical correlation. Compare equations 
(2) and (1). 

Bartlett (1) shows that this canonical variate is such that its treat- 
ment to treatment + residual sum of squares ratio, viz. R’, is a maxi- 
mum. Clearly the canonical variate is the discriminant function often 
defined as the linear function of the original observations for which 
the ratio of treatment to residual sum of squares is maximum. 

Since both U, and U, are significant, the dependence of the data, 
after removal of replicate differences, upon variety effects is not ade- 
quately explained by the above canonical variate. A second canonical 
variate, uncorrelated with the first, may be obtained by replacing Rj 
by R; = 1 — U, in equation (3). 

For the first canonical variate, equation (3) is 


5.9421 23.0379 3.1027 10.0438//\a, 
or 1.0752a, — .4608a, = 0 and —.4608a, + .1989a, = 0. Hence 


a, = .429a, or a, = 2.33a, and the canonical variate may be written as 
429y" + 


For the second canonical variate, equation (3) is 
5.9421 23.0379 3.1027 10.0438 a, 
and a, = —1.868a, or ag = —.535a,. The canonical variate may be 
written as 1.868y"" — y®. 
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Dependence of the data upon variety effects is maximum for the 
variate .429y"? + y™; any remaining dependence is maximum for 
1.868y"" — y®. Neither variate seems particularly close to either 
y +y” ory™ — y®, variates which have natural appeal. However, 
the fraction of (treatment + residual) sum of squares accounted for by 
variety effects is R} = .4446 for .429y"” + y™ and is 24 X .4836/(24 X 
4836 + 72 X .2068) = .4380 for + For 1.868y — y®, 
the fraction is R? = .3559 and is (24 X .2250)/(24 X .2250 + 72 x 
.1280) = .3695. The canonical variates are uncorrelated whereas the 
other two are not, though their correlation appears to be small. Ap- 
parently the variates y"’ + y and y“’ — y™, if appropriately used, 
could perform a satisfactory discrimination. (Of course, the variates 
to be considered depend upon the questions whose answers are re- 
quired.) For a univariate model, there would seem to be little choice 
between one with additive year and variety effects and one with variety 
effects multiplied by a year constant. 

The analysis of sums of squares for any new variate is 


(a, + Ey, + 
Tx + Ex T22 + Ex2/ \az 
T22/ \az Ex 


When two canonical variates are required, as in this case, it may be 
desired to compute their bivariate analysis, vanishing of the correlation 
serving as a computational check. Using equation (2), we obtain 
Table 6. The transforming matrix is not orthogonal. 
TABLE 6 
Bivariate analysis of canonical variates 


Source df. 8. S. M. 8S. 

.4156 .3549 1385 =.1183 

Replicates 3 
.3549 21.2257 1183 7.0752 
13.9875 . .0013 .5828 .0001 

Varieties 24 
.0013 22.7504 .0001 .9479 
17.4763 .0007 . 2427 0000 

Residual 72 
.0007 41.1784 0000 =.5719 
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TABLE 7 

Variety 1 2 3 4 5 
-429y% + y 5.85 5.50 5.69 5.98 6.18 
1.868y® — y® 1.56 1.20 2.55 1.84 1.95 

6 7 8 9 10 11 12 
4.98 5.30 5.10 6.06 6.08 5.50 6.44 
1.33 1.09 2.11 2.05 1.98 2.40 2.02 

13 14 15 16 17 18 19 
5.63 5.62 5.48 5.29 5.96 5.74 5.25 
1.68 2.71 2.06 3.15 2.18 2.22 2.28 
20 21 22 23 24 25 Grand 
5.36 5.30 5.22 6.06 5.32 5.30 5.61 
2.39 1.45 2.04 2.16 2.49 2.59 2.06 


5. The Canonical Variates and the Interpretation of the Data 


Table 7 contains the values of the 25 transformed means. These 
variables are not correlated and there is little value in observing their 
graph. This property of independence is useful in making exact prob- 
ability statements. 

Interpretation of the data comes within the province of the experi- 
menter. The preceding analysis, indicating the need of two discriminant 
functions, raises an even more difficult problem than does a significant 
F in a univariate analysis. This was to be expected since a multivariate 
analysis broadens the basis of our null hypothesis and requires no 
assumption about homogeneity of variance, an assumption that may be 
false. 

As a temporary expedient, the author proposes the following analysis 
of variance technique. From Table 6, obtain the residual sum of 
squares and divide by (72 — 1) X 4 to obtain the variance of a treat- 
ment mean for 4 replicates. The use of (72 — 1) in place of 72 is due to 
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the fact that the ratio of the coefficients for the first canonical variate 
was obtained from the data. We obtain s; = .0616 and s, = .25. Dis- 
criminate among means for the first canonical variate (Table 7) by 
some standard technique such as Duncan’s New Multiple Range Test 
using (number of means + 1) where the number of means is required. 
Justify this on the basis that the new variate has a ratio of coefficients 
determined from the data. The use of (residual d.f. — 1) and (number 
of variates + 1) is suggested by the analysis of variance argument often 
presented with a discriminant function analysis. 

For a perennial crop at a single location, the experimenter is pre- 
sumably interested in a variety which, for yield, is persistently good. 
Such varieties can be discriminated among by a single function. Thus, 
an analysis of the second canonical variate, perhaps as above using 
(residual d.f. — 2), should be with a view to finding out something about 
the variety and/or year that produced significance. This analysis helps 
locate varieties that are consistently good (poor) but sometimes do even 
better (poorer) than expected and ones that are good (poor) in some 
years and not exceptional or are even poor (good) in others. Other 
characteristics of such varieties or the frequency of the sort of year, 
i.e. the total environment, that produced such results should lead to a 
decision on retaining or discarding such varieties. 
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AN INTRODUCTORY COURSE IN BIOMETRY FOR 
GRADUATE STUDENTS IN BIOLOGY* 


C. I. 


The Connecticut Agricultural Experiment Station and 
Yale University, New Haven, Connecticut 


At the Third International Biometric Conference in 1953, a 
symposium on the first course in biometry was organized by Professor 
W. G. Cochran, who provided each participant with a list of topics for 
discussion. The present paper, from this symposium, is based upon 
eleven years of teaching biometry to graduate students in biology at 
Yale University. These students are potential research biologists and 
the course is intended to provide them with an essential research tool. 
During this period, 86 students have received University credit for the 
course and perhaps 25 more were serious auditors. Since these students 
would judge the effectiveness of the course from a different viewpoint 
than their instructor, I both queried my last two classes and sent a 
questionnaire to all earlier students whose address was known. Of 
some 85 questionnaires distributed, 75 have been returned. These 
student opinions will be considered in relation to each topic on the 
agenda of the symposium. 

Interests and preparation of the instructor and students. Although 
biometry concerns both the mathematical and statistical aspects of 
biology, the content of an introductory course depends upon the interests 
and preparation of both the instructor and the students. My course 
has been primarily statistical. This was unavoidable in view of the 
limited mathematical background of the majority of my students. It 
may be rationalized by the readier applicability of statistics than of 
mathematics in most biological research. The mathematical models 
that suffice for the statistical aspects are relatively simple, and can be 
applied in areas as distinct as botany, pharmacology, zoology, forestry, 
microbiology and the medical and agricultural sciences. Graduate 
students from most of these fields have attended my course, often in 
the same class. 

Biometry can be defined in so many ways that the viewpoint of the 
instructor largely determines the character of a course. Hence, it is 
pertinent to report my own background, which was primarily biological, 
starting with undergraduate and graduate majors in zoology and 


*Presen ted at the Third International Biometric Conference, Bellagio, Italy, Sept., 1953. 
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followed by seven years as a research entomologist. My research 
projects required an increasing use of statistical method, so that later 
I studied for two years with Professor R. A. Fisher, and since 1938 
have worked entirely as a biometrician, primarily on experimental 
problems in agriculture and pharmacology. In 1943, I began teaching 
biometry to graduate students at Yale University, originally in two 
alternating courses, one primarily for pharmacologists and the other 
for botanists and zoologists. These were combined in 1950. Despite 
many changes in content and approach through the years, all students 
will be assumed here to have taken a single course. 

The distribution of students among the different biological fields 
is shown in Table 1, about 90 percent of them being men. Only two of 


TABLE 1 
Major field of students in the course and field of employment, where known, of 
graduates. . 
Number in each field as | Number of 
Field Students Auditors graduates 
Pharmacology 31 10 28 
Other medical sciences 9 6 11 
Zoology 14 1 10 
Forestry 21 2 13 
Other plant sciences 9 2 8 
Mathematics and statistics 2 3 5 
Other areas _ 1 7 
Total 86 25 82 


those taking the course for credit had majored in either mathematics or 
statistics. The majority may have had introductory calculus, usually 
so long before that it had been largely forgotten. To insure a common 
basis, an initial chapter in the Outline, which now serves as our text, 
reviews the elementary mathematics that is assumed. If any of it 
seems strange, the student is referred to the book by Professor Walker 
(6). Although statistics is not a prerequisite, about one student in five 
has taken the subject before, and a few more have had lectures on 
statistics in other courses. Since completing their biometry at Yale, 
about one in ten has taken further work in statistics. 

The course has not been a recruiting ground for professional 
statisticians or biometricians, as evidenced in Table 1 by the employ- 
ment, where known, of the students who have graduated. By and 
large, each has remained in biology, in the field for which he trained; 
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only three or four have gone into statistics or biometry professionally. 
So far as their activities could be determined, about 58 percent are 
primarily in research, about 30 percent in teaching and the remaining 
12 percent in other activities, usually neither biometrical nor statistical. 

Purpose of the course. The objective of the course is to train future 
professional biologists to use intelligently the statistical methods re- 
quired in the design and analysis of biological investigations. How 
well was this purpose achieved? 

One criterion would be the extent to which they applied statistical 
techniques after graduation. The questionnaire asked how extensively, 
on a scale of 0 to 3, each respondent was involved in the design and in 
the analysis of experiments, and on the same scale, how much biometry 
was used in the process. Four out of five respondents gave scores of 1 
to 3 to this kind of activity, and within this range their scores averaged 
2.0 and 2.1 for design and analysis respectively. The more these former 
students were involved in either operation, the more they utilized their 
biometry (P = .01). 

Another question asked them to list the principal statistical 
techniques that they had used since taking the course. As summarized 
in Table 2, the list includes methods that would be considered fairly 
sophisticated. Some of them have not been taught until recently, so 
that the relative frequencies are only suggestive. These techniques 
were listed under three headings: (a) ‘‘methods where the information 
gained from my course was sufficient’—totalling 64 percent, (b) 
“methods where a moderate amount of additional study enabled you to 
proceed on your own’”—22 percent, and (c) “methods covered so 
briefly, if at all, that you had to learn them almost entirely from other 
sources’ —14 percent. The extent to which a statistical method was 
used depended in large part upon its having been included in the course. 
Relatively few were learned later de novo. This seems to me a good 
reason for covering much ground rapidly rather than less more 
thoroughly. 

A student should not complete a course with a false idea of how 
much he knows but be more apt than before to consult a trained statis- 
tician. One question, therefore, read as follows: ‘Have you consulted 
a statistician or biometrician in connection with your research? If so, 
to what extent has my course prepared you for these conferences?” 
Of 65 who answered the first question, 36 had consulted a statistician, 
and of the 35 who answered the second question, 32 considered the 
preparation given by the course as adequate, the other three giving 
qualified answers, such as “fine, but only after considerable experience”’. 
The following are some comments: “I think this is the chief value of a 
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TABLE 2 


Statistical techniques used after completing the course and the frequency with which 
each was mentioned in 75 questionnaires. 


Technique Frequency 


Experimental design, randomized blocks, Latin squares. . . . . 27 
All-or-none bioassays, LD50’s, probit analysis... ...... 22 
Bioassays with a graded response ........2.2.2.2.2.. 17 
Factorial design and analysis .............228-. 12 
Sampling techniques. ............. 
Standard deviation, standard error,etc. ........... 
Partitioning hereditary components ............. 


*Discriminant function, tests for normality, estimation of number of observations needed, quality 
control, mathematical derivations, construction of mathematical and probability models. 

**Effects of population density, transformations, antagonism and synergism, confidence limits, 
power functions, non-normal distributions, practical mathematical statistics, expectation, test con- 
struction and validation, graphic representation of equations. 


good course in statistics. Most experiments do not lend themselves to 
routine treatment and a well-trained statistician needs to be consulted 
intelligently.”” problems readily recognized.” “Course has been 
indispensable for understanding biometricians.” These and other 
replies indicate a lively appreciation of the value of the statistical 
consultant in biological research. 

Time required. Most students find biometry a difficult subject and 
students and faculty alike have protested the time it requires, repre- 
senting about 1/8 of the course work for a doctor’s degree. To a bio- 
metrician this is not excessive in view of the basic importance of the 
subject, but to a subject-matter department, the requirement seems 
alarmingly high. In trying to meet these complaints, I have experi- 
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mented constantly in my teaching, so that the course has changed over 
the years in content, in timing, and in many other details. 

Initially, I gave a weekly two-hour lecture, plus laboratory, for 
one 14-week semester. Most students seemed to reach the saturation 
point after one hour, so that we shifted to two one-hour lectures, ex- 
tending the course to two semesters. This conflicted with one of the 
graduate programs, so that we changed again to three one-hour lectures 
a week for one semester, and in alternate years I gave the hardier 
students two one-hour lectures a week in the second semester. After a 
considerable trial, three lectures a week has proved too concentrated, 
and we are now returning to two lectures a week through the academic 
year. 

In our experience, most statistics is learned by working illustrative 
examples and studying their meaning, so that the statistical laboratory 
has been a basic part of our program from the very start. Nominally, 
two hours of guided laboratory instruction is required for each hour of 
lecture, but students usually need an additional two hours of laboratory 
on their own. 

This past year the small size of my class has enabled me to give 
each student an individual 45-minute tutorial each week, in addition to 
lectures and laboratory. When queried at the end of term on procedure 
for a larger class, one would prefer a session alone in alternate weeks, 
two a weekly session with several students, and one tutorials held in the 
laboratory. All of them urged continuing the experiment. 

My attitude towards examinations has changed through the years. 
At the beginning each student was assigned a problem at the end of the 
course and asked to turn in an answer at his convenience. Now I give 
several examinations in a semester, each consisting of a closed-book, 
written quiz, and of one or more problems to be worked on the calculator 
with books open. Restricting each examination to the material covered 
in one section of the course has improved student morale. 

Statistical Laboratory. Except for an introductory taste test, the 
laboratory exercises consist primarily of computing and understanding 
selected numerical examples. After working through a variety of appli- 
cations, the student should recognize more readily opportunities for 
increasing the efficiency of his own research. A laboratory assistant, 
who has majored or minored in mathematics, holds individual or group 
conferences, quizzing each student on what he is doing and why. 

To allow some selection, the examples number about 250 in the first 
17 chapters of the syllabus but students seldom work as many as one in 
five. An electric calculator is provided, and tedium is reduced by 
supplying the basic terms for each problem wherever possible, such as 
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the total sum of squares. Very few examples have been invented or 
reduced artificially to one or two digits. The solution of full-sized 
examples is intended to reduce the gap between the form in which raw 
data reach the investigator and the form to which the student becomes 
accustomed. Several students in fact have recommended that some 
examples should be completely unclassified, with no clues beyond a 
statement of the biologist’s objective. One called for “more experience 
in handling raw data, following a work form is not sufficient”, a recom- 
mendation that has been strongly seconded by my most recent class. 

The importance of the laboratory cannot be overestimated. Auditors 
without a statistical background who skip the laboratory soon find the 
lectures unintelligible. During lecture, students are introduced to the 
subject, but they learn it only in the laboratory. One question in the 
questionnaire was: ‘“How would you rate the relative usefulness of 
lectures and laboratory?” Of the 62 replies, 26 considered the laboratory 
more useful than the lectures, 21 thought they were of equal value and 
seven found the laboratory less useful than the lectures. Their comments 
were often emphatic, among them the following: “I did my actual 
learning in the laboratory but lecture discussions are an essential 
supplement.”” ‘A course such as this demands both laboratory and 
lecture, one without the other would be of little value.” “Significance 
of what was said in the lecture often didn’t strike home until after a 
few specific problems were wrestled with.” ‘That material on which 
I had spent most time in laboratory has stayed with me and has been 
more useful.” “Lectures more useful in the latter part of the course 
after we had learned the basic principles of statistics.” ‘Need constant 
practice in the laboratory to grasp lecture material.” 

Course content. The course is taught from an Outline which has 
been developed over a period of years (1). Its primary purpose is to 
free the student of basic note-taking during lecture, but it is sufficiently 
detailed to serve as the principal text. Readings are assigned in Fisher’s 
“Statistical Methods for Research Workers” (2) and “The Design 
of Experiments” (3), and students are required to have “Statistical 
Tables for Biological, Agricultural and Medical Research” by Fisher 
and Yates (4). While it was in preparation, lectures followed the 
Outline very closely. 

The course starts with a class experiment based upon the tea tasting 
test in Fisher’s ‘Design of Experiments” but substituting fresh and 
reconstituted skim milk. This leads to the binomial distribution, the 
x’ distribution and contingency tables. Following chapters on the 
normal distribution, and interval estimation, the class has its first 
examination. The analysis of variance is introduced with the comparison 
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of two groups and developed progressively throughout the remainder 
of the course. The chapter on simple experimental designs is followed 
by regression with one dependent and one independent variable, and by 
factorial experiments. At this point a second pair of examinations 
has been customary. 

’ Two further chapters on regression consider parallel line bioassays 
and associated measurements. The discontinuous Poisson and negative 
bionomial distributions are introduced next, leading to transformations 
for the analysis of variance and other ways of meeting its assumptions. 
I have been unable to go beyond this point in 42 lectures. The remain- 
ing topics in the syllabus have not yet been developed in as much detail 
as the first 17 chapters, and have varied from year to year. 

More generally, the lectures consider the logic and importance of 
each procedure in solving illustrative biological problems, emphasizing 
experimental design in each case. Many of the underlying assumptions 
are expressed in simple mathematical models. Thus in the analysis of 
variance the additive model, the distinction between models I and II, 
and variance components are introduced at an early stage. Additivity 
is demonstrated numerically by isolating for each constituent a table 
of differences, which, when squared and summed, leads to its sum of 
squares in the analysis of variance. Many equations for basic statistics, 
such as x” and the regression coefficient, are presented in several forms, 
suitable for computing data collected in different ways. The advantages 
of adapting the equation to fit each major type of problem rather than 
adapting the data to fit a single equation outweigh in my opinion the 
apparent simplicity of a single general equation. 

Syllabus. The general headings and principal subdivisions of the 
course are summarized in the following syllabus. In outline form, the 
first 17 chapters vary in length of text from 3 to 13 pages. 

1. Computing instructions: points in arithmetic, symbolism, number 
of significant figures, operation of desk calculators, use of statistical 
and computing tables. 

2. A taste experiment: underlying concepts, design, interpretation 
based on the null hypothesis, criteria of rejection, relation to probability 
and randomization, algebra of combinations. 

3. The binomial distribution: some characteristics, sample and 
population defined, structure of analysis, expected and observed fre- 
quencies, parameters and statistics of the binomial. 

4. The x’ distribution: characteristics, the theoretical distribution, 
comparison of observed and expected frequencies, comparison of 
binomial statistics and parameters. 

5. Analysis of proportionate frequencies: x’ test for 2 X k con- 
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tingency tables, four-fold or 2 X 2 tables, including models, Yates’ 
correction, Fisher’s exact text, and measures of association. 

6. The normal distribution: characteristics of a normal variate, 
relation to other distributions, theoretical normal distribution, graphic 
tests for normality in large and small samples, grouping, graphic 
estimation of mean and standard deviation, transformation to a normal 
metameter, contaminated and truncated distributions. 

7. Numerical analysis of normal samples: criteria for adequacy of 
a statistic, statistics from individual observations and from a frequency 
distribution, non-efficient estimates, precision, comparison of observed 
and expected frequencies, tests of skewness and kurtosis, the rejection 
of outliers. 

8. Interval estimation: interval estimates, Student’s distribution, 
limits for the mean and for the variance, confidence vs. fiducial intervals, 
graphic limits for the mean. 

9. The comparison of two groups: logical basis, comparison of two 
variances, the F distribution, comparison of two means, analysis of 
variance, a ranking test. 

10. The comparison of several groups: structure of the comparison, 
tests for homogeneity of the variances, a quick test for comparing group 
means, analysis of variance, Models I and II for the analysis of variance, 
just significant difference and range, variance components. 

11. Simple experimental designs: comparison of paired treatments, 
randomized groups or blocks, Latin squares, split plots, missing values. 

12. Regression: assumptions and objectives, linear regression 
equations, analysis of variance of linear regressions, sampling errors, 
transformations to linear form, non-linear regression with orthogonal 
polynomials. 

13. Factorial experiments: advantages, types of factor, analysis of 
two-factor experiments, experiments with three or more factors, error 
term, control of heterogeneity. 

14. Bioassays from parallel regressions: role of the dosage-response 
curve, types of bioassay, potency from parallel log-dose response lines, 
factorial determination of potency, precision of the estimated potency, 
assays with two or more unknowns, replicated assays. 

15. Associated measurements: statement of a typical problem, linear 
functional relations, bivariate normal distribution, correlation co- 
efficient, significance of observed correlations, partial correlation, 
graphic tests for association. 

16. Two discontinuous distributions: constant vs. varying expecta- 
tions, Poisson distribution, x’ tests, indirect estimates of the Poisson 
parameter, negative binomial distribution, tests for agreement, estimat- 
ing the negative binomial k from several series. 
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17. Meeting the assumptions of the analysis of variance: objectives 
and assumptions of the analysis of variance, non-additivity in a cross- 
classification, transformations for discrete variates and for continuous 
variates, other methods for controlling heterogeneity in the error. 

18. Additional comparisons of proportionate frequencies. 

19. Probit analysis for all-or-none data. 

20. Covariance. 

21. The comparison of slopes and slope-ratio assays. 

22. Partial regression and discriminant analysis. 

23. Additional designs for controlling heterogeneity. 

24. Components of variance and the combining of experiments. 

25. Some sampling techniques. 

The several chapters in the syllabus are not of equal difficulty. 
In an attempt to assess this factor, my last two classes were asked in 
their final examinations to rank the first 17 chapters in order of in- 
creasing difficulty. The individual rankings from eight students in 
1952-53 and from four students in 1954 were converted to normalized 
scores (4) and averaged separately for each year. The chapters and 
mean scores have been listed in Table 3 in order of increasing difficulty 
as determined from the average of the means in each set, although the 
class rankings differed in detail. The “just significant range” beneath 
each column in the table has been computed by Keul’s definition (5) 
for comparisons of two to five items. 

The individual scores for each year have been examined by the 
analyses of variance in Table 4. The more recent the chapter the more 
difficult it was judged (row 1), the trend being more pronounced in 1954 
than in 1952-53. After allowing for this trend, the chapters still varied 
significantly from one another in difficulty for both classes (row 2). 
The correlation between mean scores for the two years, after removing 
the trend on order, was suggestive but not significant (r = 0.41). A 
comparison of the error mean squares indicates greater agreement among 
members of the second, smaller class. 

Student comments. Four out of five questionnaires contained com- 
ments and suggestions concerning the course. About one former 
student in three testified as to the value of the course in his career, 
especially in the design of experiments, in the evaluation of data, and in 
his ability to read the literature critically. One of the best students 
placed the blame for any difficulties he had experienced upon himself. 
One who audited the course reported that, “I hardly see how I could 
operate without it.”” Another commented, “that your course has given 
me a decided professional advantage over most of my colleagues, who 
are for the most part abysmally ignorant regarding biometry”, a view- 
point expressed by several others. One commented that, “The single 
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TABLE 3 


Chapters 1-17 in syllabus arranged in order of increasing difficulty from average of 
the mean scores for two classes, with the just significant range for comparing 2 to 5 
items in each year (5). 


Chapter Mean score 
No. 1952-53 1954 Subject 
—1.36 —1.79 Computing instructions 
2 — .87 —.99 A taste experiment 
6 —.08 —.99 Normal distribution 
3 — .47 — .83 Binomial distribution 
7 — .46 — .56 Calculation of normal samples 
9 —.75 .12 Comparison of two groups 
10 —.39 .26 Comparison of several groups 
4 — .08 04 x? distribution 
5 32 ioe Contingency tables 
ll — .20 .38 Simple experimental designs 
8 .26 — .09 Interval estimation 
15 .52 .36 Associated measurements 
16 41 .56 Distribution of counts 
14 1.26 04 Log-ratio bioassays 
17 34 .98 Meeting the assumptions of Anova 
13 oe 1.67 Factorial experiments 
12 1.29 1.19 Regression 
2 65 .66 Just significant range for 2 to 5 means 
3 .78 .79 in each column. 
4 86 87 
5 .92 93 
TABLE 4 
Analysis of variance of the scores averaged in Table 3. 
1952-53 1954 
Term D.-F. M.S. F D.F. MS. F 
Trend on order of study 1 34.712 15.43 1 31.825 28.48 
Chapters around trend 15 2.249 5.17 15 1.118 5.20 
Remainder 112. 48 .215 


most important contribution to my own preparation was to lay before 
me the logical method of attacking a scientific problem, whether the 
experiments are to be analyzed statistically or not.’”” No doubt every 
teacher of biometry has received similar reports from former students. 

Many comments were more critical. Opposite changes in emphasis 
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have been recommended. Eight comments are typified by the following: 
“More fundamental theory would have helped, I got a sense of using a 
cook book in solving problems, which is annoying.’”’ Others expressed 
the opposite view. A former assistant reported “biologists didn’t seem 
too keen about theoretical matters, they wondered why and how an F 
value was reached but didn’t like to go into the mathematics of it’’. One 
comment reads, “the most important aspect of biometry from my ex- 
perience is proper application, how the formulae are derived is not as 
important in this applied field’. 

Several wanted more specialization within branches of science, but 
others were willing to settle for more examples from which to choose. 
Only four comments questioned rather mildly the desirability of teach- 
ing a single course in biometry to biologists from quite different fields, 
a basic feature of the course. In at least one case, taking the course 
changed the student’s original viewpoint as indicated by the following 
comment: “Although I disagreed with you on this point, I want to 
congratulate you on having the courage to conduct a course for all 
biologists. I disagree completely with those who ‘pigeonhole’ the 
different fields of biology.” In another reply, this basic tenet is con- 
sidered to present a major difficulty “in the fact that each student 
brings such a widely different background to the course. While the 
skeletal outline of the course is adequate, each student must be given 
special attention to a degree not warranted in other courses and this 
attention must be a function of his background, needs, tastes and 
objectives. Perhaps an impossible order but no other course has this 
inherent and unfortunate complexity.” 

Some suggestions concerned the laboratory. Two wanted examples 
of the misuse of statistical technique and instruction in what not to do 
as well as in what to do. One wanted more emphasis upon evaluating 
the method used in obtaining biological data. One proposed “that the 
laboratory exercises include in each section at least one very simple 
example, one in which the arithmetic is so very simple that the calcula- 
tions can be followed without a calculator and at home.” A former 
assistant spent “considerable time teaching the students elementary 
statistics before they could proceed to the assigned examples. As long 
as provision is made for this supplemental teaching, I would recommend 
continuing a highly concentrated course instead of diluting it.” How 
the laboratory assistant should spend his time is debatable. He might 
discuss informally with small groups of students the mathematical 
background of biometrical relations to compensate for their gaps in 
basic mathematics. Alternatively, he could concentrate on practical 
advice on actual computations and on the biological interpretation of 
results. The latter is the primary intent of laboratory instruction, 
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although only the laboratory assistant may be able to appreciate which 
background concepts need explaining to a particular student. 

The most frequent criticism from a dozen or more former students 
was that “too much material was covered in too short a period of time.” 
Another wrote “one had the feeling he was just coming out of a fog when 
the instructor rushed onto something new and lost us again”. The 
remedy suggested most frequently was “to extend the course to a full 
year rather than a semester’ and this recommendation has now been 
adopted with two one-hour lectures a week. 

One other concept appeared in several questionnaires. This is the 
idea of a delayed response in learning statistics. One former student 
wrote, “I entered this course completely ignorant of statistical techniques 
and theory. I was somewhat confused and bewildered when I finished. 
However, since returning to my normal work, I find that I can work 
fairly well in a statistical sense in my own field and more than hold my 
own statistically with my associates.’”’ A second commented: ‘What 
stands out in my mind, even though it is now six years later, is the 
extreme practicability of the course. I am astonished now that I got 
as much as I did, since my feeling when attending the course was often 
one of ‘can’t see the woods for the trees’.”” Another wrote: “I suspect 
that I never began to feel ‘easy’ about even the simplest statistical 
methods until I began to try to teach a little statistics to medical 
students.” One who is now teaching a semester of biometry to zoology 
majors is “convinced that during the course you can only expose. The 
real learning comes later through solving particular problems. Only 
later does the usefulness of the course become evident. At the end of 
your course I would have rated it as mediocre, but ever since I have had 
a clarity of thinking in the field that is extremely useful.” These 
last comments suggest that in teaching biometry there is a latent period 
between the stimulus of teaching and the response of learning and that 
the real effectiveness of a course cannot be judged until later. 
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THE THEORY OF BACTERIAL CONSTANT 
GROWTH APPARATUS 


C. C. Spicer 
Central Public Health Laboratory, Colindale, London, N.W.9 


Recent work in bacterial genetics has emphasised the usefulness of 
apparatus whose purpose is to maintain a constant population of 
bacteria in a state of active growth. Several such devices have been 
described for example by Monod (1950), Novick and Szilard (1950), 
and Perret (1954). 

It seems worthwhile to give a short account of the underlying theory 
as a guide to the problems of design likely to be encountered with 
organisms of different growth characteristics; or under various conditions 
of culture. 

Mathematically the problem may be stated as follows:—Consider 
an organism growing freely in a limited, constant volume of nutrient. 
After some period of growth factors come into play which depress the 
power of the organism, to divide and eventually stop it growing alto- 
gether. These factors may be of several different kinds: for example, 
exhaustion of nutrient, insufficiency of oxygen, or production of some 
toxic metabolite, the general effect however, is that the growth rate of 
the population at any time is a function of its size (n) so that 


1 dn 


where f(n) is some function of n. If the organism is growing in some 
apparatus which is constantly renewing the medium and concurrently 
removing a fraction 8 of the organisms per unit time, the equation of 
growth becomes: 


and the washing-out rate required to maintain a population of a given 
size is found by solving the equation f(n) = 8. Such an equilibrium 
is not necessarily stable. For instance, if the population is growing 
exponentially it is not possible in practice to maintain a constant 
number by simply renewing the medium, as small discrepancies between 
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the growth rate of the organism and the turnover of the medium always 
occur which result in either washing out of the organisms or over 
growth. 

In general, if the equilibrium population % by some accident becomes 
(% + »), where 7 is small compared with 7 we have 


If nis small f(% + n) can be expanded by Taylors Theorem and, ignoring 
terms in 7’ and higher powers of n, 


— = (i + + nf’@) — BH + 2) 
At equilibrium 8 = f (7%) 


Now if the equilibrium is to be stable any change in 7 must cause an 
opposite change in dn/dt, i.e. if n is positive dn/dt must be negative and 
vice versa. So, in general, equilibrium is only stable if f’(7) is negative. 
In other words there can be no stability unless the growth rate de- 
creases as the concentration of organisms increases. 

The most completely worked out system so far used is the chemostat 
of Novick and Szilard (1950a, 1950b). This applies to an organism 
dependent on a nutrient factor present in such a limiting quantity that 
small variations in concentration can cause corresponding variations in 
growth rate. Then, if c is the concentration in the growth tube, the 
equation for the growth of the organism is 


ldn 


and the corresponding equation for changes of concentration is 
= Bla — Filn, 


Here, a is the concentration of nutrient in the incoming medium, and 
F.(n, c) is a function describing the rate at which the nutrient is taken 
up by the organism. 

Novick and Szilard have shown, for several nutrient factors, that 
over a certain range of c we can write 


F,© = 


F.(n, c) = xne 


dy 
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Where } and « are constants. A similar approximation should hold for 


any nutrient in limiting concentration. 
The differential equations of growth under these circumstances are 


At equilibrium the concentrations of organisms (7%) and nutrient (@) can 
be found by equating the derivatives to zero, which gives 


nh = Na — 2)/x 


The general solution of the two simultaneous non-linear differential 
equations for n and ¢ cannot be conveniently given in general terms. 
It is possible however, to investigate the response to small displacements 
from the equilibrium position. In the region of equilibrium put n = 
(% + n),c¢ = (@ + &) where 7 and é are so small that their squares and 
product may be neglected. Substituting these variables in the growth 
equations it is found that 


The solution of this pair of equations can be put in the form 
n = + 
+ 


where the coefficients A and B are determined by the initial conditions 
and yw, , and yu, are the roots of the quadratic equation 


x’ + + AB(a — @) = O 


Now, so long as a > @, which is necessary if a constant population is 
to be maintained, the roots of this equation are real and negative. 
Consequently any small displacement from equilibrium dies away 
exponentially without oscillations and the steady state is always stable. 

It is worth pointing out that if a population of organisms grows 
according to the conditions specified by Novick and Szilard, but without 
washing out or removal of nutrient then its growth curve follows the 


res, 
B(a — c) — 
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well known “Logistic Law”. Eliminating c from the differential equa- 
tions of growth we have 


n dt (No + KMo) — Kn 


where n, and ¢, are the initial concentrations of organism and nutrient. 
This is the equation of a logistic population whose final size, is 


K 


The number of organisms which have been produced from a unit con- 
centration of nutrient is then 


Mo — No _ 
Co K 
So that \/x is the amount of growth factor required to make a single 


organism. 
Writing the equation of the logistic curve in the form 


1 + 


with the origin of ¢ at the time when N = N../2, then the value of the 
constant ¢ is given by 


€ = Ny + KN 


As a first approximation to the behaviour of the organism in a constant 
growth apparatus, it can be considered to be growing logistically but 
being at the same time washed out, so that 


1 dN N 
Under the general stability conditions f’(n) is negative and equilibrium 
is always stable, also 
r= (1 
€ 


so that theoretically any desired population < n. can be maintained. 
However, the smaller the population the greater 8 must be, and unless 
it is regulated with great accuracy it is liable to exceed ¢ and the popula- 
tion will then be washed out. 

It is probable that the equilibria attained with rather poor media 
will all be of the general type discussed here. The general form of the 
dependence of growth rate on a limiting nutrient is approximately 
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exponential so that 


As the concentration of growth factor is increased it ceases to have an 
increasing effect on growth, while at low concentrations its effect is 
approximately linear. Differential equations of growth which contain 
this form for F,(c) have a stable equilibrium similar to that for the 
simple linear form, but with a different time constant. There are no 
oscillations about the equilibrium. 

As a contrast to the case of growth limited by shortage of growth 
factor discussed above it is worth considering a simple model of a 
population which is limited by production of some toxic substance. 
There is no example of this kind that has been so well worked out as 
Novick and Szilard’s nutrient scheme, but there is no doubt that toxic 
limitation can occur. It could be imitated in a constant growth appa- 
ratus by adding an antibiotic at a rate governed by the density of 
organisms. 

Taking only the simplest case, the differential equations of the 
system would be 


1d 

ndt 8) we 
d 

Be 


The constants » and y here represent the lethal effect of toxin on 
the organisms and its rate of production by them. In the absence of 
washing-out (8 = 0) the organisms eventually become extinct while 
the concentration of toxin rises logistically to a constant value. Under 
constant growth conditions an equilibrium is established when 


A- 8 
The equations for determining the stability of the equilibrium are 
d 
= 


where, as before, 7 refers to the disturbance of bacterial numbers and 
— to that of toxin concentration. The solutions are of the exponential 
form given above, and the coefficients of ¢ in the exponents are the 


| 
| | 
2 


230 BIOMETRICS, JUNE 1955 


roots of the quadratic equation 

x’ + Br + = 
When 6 > 4/5 X both roots are negative and real and the equilibrium 
is stable. If 8 < 4/5 X, then the roots are imaginary with negative 
real parts and the equilibrium is still stable, but is reached by damped 
oscillations. 

The general equations of equilibrium, which are applicable to both 
toxic or nutritional schemes, can be derived from the two differential 
equations. 

1 dn 


= B(a — c) + F,{c, n) 


by expanding the functions F, and F, about the point (7,2) in a Taylors 
series. This procedure gives, for small displacements 


dy _ 
dt 
dé oF, OF, 


The quadratic equation whose roots are the coefficients of ¢ in the 
solution is 


For stable equilibrium 0F,/dc and dF ,/dn must be of opposite sign and 
if OF,/dc > 0 then dF,/dc < 8B. 

Summary. A mathematical analysis is presented of the mechanism 
of certain types of bacterial constant growth apparatus. The con- 
ditions of equilibrium and the nature of response to displacements 
from it are derived. 


Acknowledgements 


I am very grateful to Dr. P. Armitage for his helpful comments 
on this paper. 


References 


Monod, J. La technique de culture continue. Ann. Inst. Pasteur 79: 390 (Oct.) 1950. 

Novick, A., and Szilard, L., (a) Experiments with the Chemostat on spontaneous 
mutations of bacteria, Proc. Nat. Acad. Sci. 36: 708. 

Novick, A., and Szilard, L., (b) Description of the Chemostat, Science 112: 715. 

Perret, J., (1954). In the press. 


n dt ' 
4 
dt 
dt | dc | on 
0c dc On 


AN INVERTED MATRIX APPROACH FOR DETERMINING 
CROP-WEATHER REGRESSION EQUATIONS* 


Haroitp F. Huppieston 


U.S.D.A. Agricultural Marketing Service 
Washington, D. C. 


Introduction 


We would like to know whether year-to-year changes, or month-to- 
month changes in crop yields or prospects are consistent with observed 
weather data. Generally, historical weather records extend back 
farther than records of crop yields. We wish to make use of weather 
data for the entire period of record even though yield data may be 
available for a much shorter period. This paper reports on an ex- 
ploratory inverted matrix approach used in one phase of a crop-weather 
study. 

The application of multiple regression methods in the study of re- 
lationships between crop yields and weather factors is, of course, not 
new, but the large amount of computational labor involved has dis- 
couraged many workers and our people from attempting correlations 
studies on a very extensive scale. As pointed out by R. A. Fisher, the 
use of the inverse matrix solution of a set of normal equations greatly 
reduces the amount of computations when the same set of independent 
variables is used repeatedly; in addition, it serves to simplify the 
calculation of sampling errors of the regression coefficients. However, 
a large amount of computational work is still required when the various 
dependent variables are available for only relatively few years, and 
these periods vary from crop to crop because of the fact that the data 
or series were started at different points in time. We would like some 
way of utilizing all the weather and crop yield data available. There- 
fore, we would like to devise what might be called “generalized inverse 
matrix solution” for a given State or area which could be used whenever 
the given set of weather factors were appropriate. However, the 
sampling errors of the regression coefficients cannot be computed using 
the elements of this generalized solution where the dependent variables 
are used for only a subperiod. 

The inverse matrix solution is obtained for a given set of independent 
variables (i.e., weather factors) for the entire period of the weather 


*Paper given before the Biometric Sessions, Gainsville, Florida March 1954. 
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records, or at least some fairly long period of years. It is found that 
the elements of the inverse matrix exhibit stability as the length of the 
period is increased. The elements or their ratios are used with the 
covariance terms between yields and each of the weather factors for 
the various subperiods for which crop yield data are available. Obvious- 
ly, the underlying assumption which is made is that the interrelationship 
between given weather factors will remain fairly constant and become 
more reliable over time. In the example used it is assumed that this 
stability is over years for fixed months. 


Nature of Study 


In order to clarify the ideas and procedures suggested in the present 
study, an éxample of an application is given. The State of Illinois has 
arbitrarily been selected for examination and illustrated for corn yields. 
The study was conducted in the following manner... A linear relation- 
ship between yields and monthly rainfall and temperature data was 
used. Linear regressions have been found to give fairly satisfactory 
results in many cases for these variables. The functional relationship 
used was as follows: 


Y= Do + bX, + + 


Where 
Y = yield per acre 
X, = average monthly rainfall for State. 
X, = average monthly temperature for State. 
X; = product of average monthly rainfall and temperature for the 


State, or X, = X,-X, neglecting decimals (i.e., the product 
of rainfall and temperature). 


Since in many multiple regression studies, joint effects may be 
important, the product (X;) of rainfall and temperature was included 
as a third factor. The utility of the third factor has been pointed out 
by Hendricks’ and Scholl where an understanding of the effects of 
weather is of interest and its inclusion appears desirable for a generalized 
regression approach. 

The period selected for study was 1891-1950. The rainfall and 
temperature data for the month of July were selected for examination. 
The inverse matrix solutions were computed for the following sub- 
periods as well as the entire period: (1) 1891-1910; (2) 1911-1930; (3) 
1931-1950; and (4) 1911-1950. Table I indicates the C;; values for 


1Agricultural Experiment Station Technical Bulletin # 74 ‘Techniques in Measuring Joint 
Relationships” by North Carolina State College 1943. ; 
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TABLE I 
Inverse Solutions—Illinois July Weather Data 
20 Year Periods 40 Year Period|60 Year Period 
1891-1910 1911-1930 1931-1950 1911-1950 1891-1950 
Cu | +3.4509 +1.2288 +7.7944 +14.001 +5.0598 
Cy | +0.14000 +0.093251 | +0.23438 + 0.451389 | +0.16964 
Ci; | —0.047482 | —0.015188 | —0.10123 — 0.18371 —0.066759 
Cx | +0.015796 | +0.022006 | +0.014153 | + 0.019131 | +0.0087353 
—0.0017839 | —0.00091567) —0.0030412 | — 0.0058850) —0.0022128 
Css | +0.00062649) +0.00019709} +0.0013209 | + 0.0024136} +0.00088277 


the various periods, where the C;,; are defined by the following set 
(i.e., 7 = 1, 2, 3) of equations where k = 20, 40, 60: 


+ 22.02 + 220s; 


+ + pie = 0,0, 1 


A study of the values in Table I indicates the absolute values of 
the C;,; will vary considerably from one period to the next. However, 
the ratios of the C,; to each other are of interest in studying the tendency 
for stability of interrelationships of weather factors. In addition, 
Table II below shows the ratios of C;; to C,, for each period. 


TABLE II 
Ratios C;; to Cu 


20 Year Periods 40 Year Period|60 Year Period 
1891-1910 1911-1930 1931-1950 1911-1950 1891-1950 
Cu’ 1.00000 1.00000 1.00000 1.00000 1.00000 
Cy’ .04056 .07589 .03007 03224 .03353 
C3 — .01376 — .01236 — .01299 — .01312 — .01319 
Cx! .004577 .01791 .001816 .001366 .001726 
Co3' | — .0005169 | — .0007452 | — .0003902 | — .0004203 | — .0004373 
C3’ .0001815 .0001604 .0001695 0001724 .0001745 


‘ 
‘ 
Nk nk 
1,0 
ne 
7 
pe. 
7 


234 BIOMETRICS, JUNE 1955 


An inspection of the ratios reveals several things; (1) Relationships 
based upon 20 years of weather data may be expected to have little 
reliability for subsequent years; (2) in general, it would seem advisable 
that relationships using weather data for as large an area as a State 
should be based upon at least 40 years of data in order to obtain stable 
relationships among the weather factors; and (3) the ratios are fairly 
stable from period to period in contrast to their absolute values. 

The utility of the multipliers for a long period of years which could 
be used as “population values,” i.e., C’;; , as indicated by this analysis 
appears to be dependent upon: (1) Finding a quick method of estimating 
a factor of proportionality, K, by which one can convert the ratios to 
absolute units, or (2) using the ratios of the C,;; , as in Table II, to 
compute regression coefficients proportional to the net regression 
coefficients; then obtain the relationship between yields and the weather 
factors by plotting the computed regression values (using the pro- 
portional regression coefficients) against the actual yields or deviations 
from the average yield. Further study of the variances and covariances 
involved appears necessary before any conclusion can be made con- 
cerning the feasibility of determining a suitable value of K a priori. 

The multipliers in Table I for any of the periods may be used with 
any number of crop yields for the same period for the State by computing 
the respective covariance terms. The computational work is, therefore, 
considerably reduced. The data in Table II for the 60-year period 
(last column on right) is thought of as a “general solution’’. 

As an example of the use indicated in (2), the yield of corn is corre- 
lated with the July weather data for Illinois. The proportional net 
regression coefficients are computed as follows: 


{2.34 = Ch ny + Cfo rey + Cis 
= Cie ny + Cy rey + Cis 
= ny + Ci > rey + Cis 


Where >-z,y, >-x.y, and “x,y are sums of products of deviation from 
means for the yield of corn per harvested acre (Y) with the monthly 
averages of rainfall (X,), temperature (X,) and the product of temper- 
ature and rainfall (X,) for the period 1911-1950 after the yields have 
been adjusted for trend (i.e., by use of 10-year averages). The C(; 
used are based upon the period 1891-1950 in equation 1) and 1911-1950 
in equation 2). The regression equation for computing values from the 
proportional regression coefficients is: 


Yi= + Die. + Ddjs.12%s 
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or 
1) Y’ = —2.3152, — .12957. + .03752; 


The actual regression between the adjusted yields and weather factors 
determined from the data for the period 1911-1950 is given by the 
following equation: 


Y. = Dyi.23%1 + Dye. 13%2 + Dys.12%3 
or 


2) Y, = —24.592, — 1.21427, + .3532;, 


The values of Y, and Y! are plotted against Y — Y, (deviations from 
trend) in Chart I. 

An inspection of Chart I indicates that there is little difference in 
the relationship found by use of the actual data for the period 1911-1950 
and the ratios of the C;; for the period 1891-1950 with the covariance 
terms for 1911-1950. 

A factor (K) which can be used to covert the proportional regression 
coefficients to the actual values will be equal to the slope of the re- 
gression line in Part B of Chart I. That is, K (determined by least 
squares method) multiplied by the proportional regression coefficients 
will give the regression in absolute units. A comparison of the co- 
efficients in equations 1) and 2) indicates a factor of about 10 is needed 
to convert the proportional coefficients to an absolute basis. 

The C,,;’s (or C!;’s) in column 5 of Table I (or II) can similarly be 
used with various subperiods corresponding to the years for which the 
individual crop yield data are available. However, we would prefer, 
in general, to express the yield data as a percent of the normal or average 
yield rather than as deviation in absolute units. If the yield data are 
expressed in the percentage form, year-to-year changes are indicated 
by the ratio of the two years. The percentage change can then be 
converted to bushels per acre rather than determining the regression 
coefficient in their true or absolute units. 

Conclusions: While a fairly large amount of computational work is 
involved in any multiple regression technique, it is believed that a 
generalized regression approach may be useful in many situations. The 
utilization of lengthy weather records to establish stable relationships 
among weather factors with determination of the covariance terms 
where yields are available for a much shorter period of time would 
appear practical based on preliminary results. In addition, the time 
and costs required to compute the inverse matrix solutions are not 
nearly so formidable with the aid of modern computing machines as 
has been the case in the past. It is possible that if work can be expanded 
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along these lines a more objective means for estimating the effects of 
weather factors on crop production from available weather records can 
be used to supplement the current procedures of the Crop Reporting 
Board. 


7 


QUERIES 


Georce W. SnepeEcor, Editor 


QUERY: A recent query (107, March, 1954) presented an inter- 
114 esting discussion of some points on Sheppard’s correction. I 

would like to raise some additional points on application of the 
correction in making tests of differences between means or analysis of 
variance tests. The pertinent reference again is Fisher. I also checked 
M. G. Kendall’s “Advanced Theory of Statistics’’. 

In my case I was supplied with a set of data in frequency distribution 
form. Unfortunately, the class interval was rather wide, 200 units, 
while the estimated standard deviation was about 270 units (based on 
the grouped data). On the other hand, the data included the means 
calculated from the original ungrouped observations for each treatment 
combination. 

After completing the analysis without correction, it occurred to me 
that perhaps the matter of Sheppard’s correction should be considered. 
Hence, I checked the references noted above, but was not satisfied 
with the information obtained. That is, I was not told exactly why the 
correction was not to be applied for tests of significance even though it 
seemed to be appropriate for estimation. 

In my situation it appeared to me that since I had means based on 
original data it might be appropriate to apply the correction for esti- 
mating the variance of a difference between means. Upon carrying 
out the necessary calculations, I found the correction to the second 
moment to be large, but the actual effect on the final value of Student’s 
t or a normal deviate, Z, to be negligible. 

In discussing the matter with a colleague this point of view was 
suggested: When both the mean and standard deviation are calculated 
from a grouped frequency distribution, the two statistics are both in 
error by some amount arid the direction of the error for the mean is 
unknown. Thus, one might recommend, as does Fisher, “do not apply 
the correction for tests of significance” and the long-run results should 
be all right. 

Question: (1) What is the real basis for Fisher’s advice? and (2) 
Was I right in not applying Sheppard’s correction for my case? 


The basis of Fisher’s advice was that grouping introduces 

ANSWER: an additional component of variance of which the magni- 
; tude is known on the assumption of perfect grouping, e.g. 
that the true measurements of those classed as 17 units do all lie between 
16.5 and 17,5 exactly, and are all that lie between those limits. For an 
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analysis of variance the effect is simply to add this fixed quantity to all 
mean squares, so reducing the probability that they should be unequal 
at any chosen ratio. In effect, errors of grouping, like other errors of 
random sampling, lower the precision with which any comparison can 
be made. Their exact and particular effects are always unknown, 
although the average magnitude is known, and is what is removed from 
the variance in making Sheppard’s correction. 

In your case errors of grouping have not been introduced in calcu- 
lating the means to be compared, but only in calculating the estimate 
of error. I should, in such a case, apply the correction to the latter 
before testing the significance of the former. 


R. A. FIsHER 


QUERY: Marvin Zelen in a recent issue of Biometrics (p. 273, 
115 Vol. 10) states “almost every experiment in the physical sciences 

is characterized by the block being a ‘natural experimental 
unit’.”. This terminology is not in accordance with the generally 
accepted (?) idea that the experimental unit is part of the array of 
experimental material (including perhaps a classification of material 
by time or other extraneous attributes) wich receives a treatment 
independently of other parts within the restrictions of the design? 
What exactly does the term ‘‘natural experimental unit’? mean? 


I am not quite certain that I fully understand the query. 
ANSWER: MHowever I shall amplify my statement concerning “natural 

experimental units” in the hope that this will also satis- 
factorily answer the query. First to quote Cochran and Cox, in their 
book Experimental Designs (p. 15), ‘‘We shall use the term experimental 
unit to denote the group of material to which a treatment is applied 
in a single trial of the experiment. The unit may be a plot of land, a 
patient in a hospital, or a lump of dough, or it may be a group of pigs 
in a pen, or a batch of seed.” 

The reason for using the adjective natural was to further emphasize 
that in physical science applications, the block arises because of some 
natural grouping of the experimental material or because of limitations 
in applying the different treatments. On the other hand, in many 
agricultural field trials a plot of land is selected for the experiment and 
the land is arbitrarily partitioned into blocks for the purpose of the 
experiment. That is, the partitioning of the land into blocks or units 
is not unique and usually depends on the convenience of the individual 
who is planning the work. 
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In many experiments there is a ratural limitation within the ex- 
periment itself which determines the block. For example, in an ex- 
periment on eye preparations which are to be tested on humans, the 
block might consist of an individual and thus only two different prep- 
arations could be applied within any one block, one for each eye. The 
eye is the experimental unit, they come in pairs to form a “natural’’ 
block, and there is nothing the experimenter can do to change the 
situation, unless of course, he has access to three-eyed people. 


MaRvIN ZELEN 


QUERY: W. T. Federer and C. 8. Schlottfeldt in a recent issue 
116 of Biometrics (Vol. 10, p. 290) state, “The decision to use covari- 

ance to control gradients after the experimental results have 
been studied invalidates the use of tabulated probability values for 
the standard tests of significance’. Can the authors elaborate this 
statement further? The statement appears to contradict some of R. A. 
Fisher’s writing; e.g. in Design of Experiments and much that is written 
in texts on statistical methods. 


The statement referred to above does not contradict the 
ANSWER: material in R. A. Fisher’s The Design of Experiments 

or in statistics texts. To illustrate consider that two six- 
sided dice are to be cast singly. Now, if one die is observed first before 
placing bets then the cast of the second die is all that is important. For 
example, suppose that a six is observed on the first die. Now, the 
probability of obtaining any number between 7 and 12 on the two dice 
is 1/6. That is, only the result of the second die counts in computing 
the probabilities. The probabilities of obtaining the numbers 2 to 12 
resulting from casting two dice simultaneously cannot validly be used 
for the “result guided procedure” described above. 

If the experimental results are studied to determine which covariate 
will reduce the experimental error, the tabulated probability levels for 
t, z, F, x’, etc. cannot validly be used to test these experimental results. 
However, if the experimenter decides on the covariate prior to studying 
the experimental results, then the tabulated levels of the various tests 
of significance may validly be used. 

Little published material is available on the problem of first studying 
the experimental results and then deciding what to do next. Dr. T. A. 
Bancroft, Iowa State College, and his students have made a start on the 
problem of using “‘result guided procedures’’. 


W. T. FepERER 
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Joint Meeting of the Institute of Mathematical Statistics and The 
Biometric Society (ENAR) April 22-23, 1955 Chapel Hill, N.C. 


SAMUEL W. GREENHOUSE. (National Institute of Mental 

304 Health and George Washington University.) Information and 
Distance Applied to Discriminant Analysis Between Two Normal 
Populations. 


Given two k-variate normal populations 7, , and 7, , with parameters 
Hcp) and o,,) (p = 1, 2), where u is a vector of means and o the matrix 
of variances and covariances, Kullback defined the mean information 
in an observation X(=2, , X., -++ , X,) drawn from 7, , in discriminating 
between 7, , and z, as (1 :2) = f f, log (f,/f.)dx. With a similar 
definition for J(2 : 1), he defined distance as J(1, 2) = (1:2) + (2:1) 
= Jf (f: — fe) log (f:/fe)dx . 

In discriminant analysis, one seeks a linear function of the z’s to 
distinguish between 7, , and 7, . In this paper both information and 
distance are maximized in two situations: = o (2) and ¥ 
In the former situation the same linear discriminant is obtained as that 
found by Fisher and is equivalent to the likelihood ratio solution. In 
the latter case, the same principle of maximizing information and 
distance is used to obtain a linear discriminant. Here, however, max 
I(1 : 2), max J(2 : 1) and max J(1, 2) yield different functions. Errors 
of classification are investigated for each function and compared with 
the errors associated with linear functions obtained by other means. 


JOHANNES IPSEN. (Harvard School of Public Health.) Ap- 
305 propriate Scores in Bio-assays using Death-Times and Survivor 
Symptoms. 


Many bio-assays can be arranged in a (mak) contingency table 
with k doses and in m categories of biological observation, ranked in 
order of increasing effect of treatment (e.g., survival times —, survival 
with symptoms —, and survival without symptoms). The author 
obtained a set of m scores that satisfies one criterion of an efficient 
bio-assay: 

- The variance of the linear regression of the mean scores on log dose 
is the highest possible fraction of the total variance. 

A procedure is described for combining data from separate experi- 
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ments of similar kind, to obtain a score system so that the common 
slope is the maximum possible fraction of the total variance adjusted 
for individual means. 

Significance tests for different score systems are described and the 
method is applied to an inter-institutional assay of tetanus toxoid 
comprising 96 experiments. 


D. G. HORVITZ, J. FLEISHER, and A. L. FINKNER, 
306 (North Carolina State College.) A Comparison of Random and 
Non-Random Plot Selection. 


The Agricultural Marketing Service of the United States Department 
of Agriculture is engaged in an extensive research program of objective 
sampling and measurement methods with a view toward improvement 
of crop acreage estimates and production forecasts. Included in the 
program is an investigation of the association of observable cotton 
plant characteristics during the growing season with final yield in order 
to develop a reliable production forecasting procedure. The plant data, 
including boll counts, are collected from small plots within sample 
fields. 

Chain measurements of dimensions on a sample of 60 cotton fields 
in three North Carolina counties permitted random selection of plots 
within. these fields and hence an evaluation of less costly non-random 
methods of locating similar sized plots. Four non-random plot selection 
schemes were examined, each scheme yielding a pair of double row 
plots 10 feet in length. The first of these schemes selected a border 
plot and an interior plot, the second selected an end of row plot and an 
interior plot, the third and fourth both selected a pair of interior plots. 
One of the four schemes was assigned at random to each sample field; 
two random plots were also selected from each sample field. 

In addition to comparison of the mean boll counts on September 1 
and at harvest, the data were analyzed to determine the contributions 
of the various error components to the total error. The schemes using 
pairs of interior plots yielded positively biased boll counts on both 
occasions while those consisting of a border or end of row plot and an 
interior plot were negatively biased. The latter schemes exhibited two 
to three times the variability of the schemes consisting of two interior 
plots. The greatest portion of this difference is accounted for by the 
large variability of the individual field biases for the non-random 
schemes using a border or end of row plot. The covariance between 
the individual field biases and the true field average also contributed 
considerably to the magnitude of the mean square errors. 
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The four non-random plot selection schemes taken as.a group indicate 
undue emphasis on border and end-row plots. The statistical efficiency 
of the group relative to random plot selection was estimated to be 70.6% 
for September 1 boll counts and 61.2% for final boll counts. A distribu- 
tion of the non-random plots which increases the ratio of interior plots 
to border and end of row plots should reduce the net bias and raise the 
efficiency. 


M. C. K. TWEEDIE. (Virginia Polytechnic Institute.) 
307 Some Applications of a Special Lemma on Characteristic Func- 
tions. 


R. A. Fisher (Prof. Royal Soc. London, A, 144 (1934) showed that 
in some families of distribution functions a parameter appeared in such 
a position that the characteristic function could be evaluated without 
integration. This note applies this idea to further problems, and shows 
that precisely chi-square distributions can arise in more general cases 
than directly from normal or other chi-square distributions. 


G. S&S. WATSON. (Australian National University, Canberra.) 
308 Contingency Tables with Missing or Mixed-Up Cell Entries. 
(By Title) 


In analysis of variance, missing or mixed-up entries may be dealt 
with by well-known methods. The same problem seems to have been 
overlooked in the analysis of frequency data. It is shown in this paper, 
however, that the method of maximum likelihood leads to easy solutions 
of these problems in the analysis of contingency tables. 


German Section of the Biometric Society at Bad Nauheim 
(Kerckhoff-Institute) January 28-30, 1955 


309 R.K.BAUER, Munich. Experiences with discriminant functions. 


Since it has been proved that the Fisher-Welch analysis yields 
optimum separation, the question has been settled which method of 
statistical balance should be used in diagnosing paternity. Ludwig 
suggested the application of the Penrose-Smith analysis. Then the 
assumptions may be weakened which have to be made on the separating 
traits, i.e. on the hereditability of the morphologic and physiologic 
items. A certain degree of freedom is gained in defining hereditability. 
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The most serious of the remaining assumptions, the homogeneity of 
variances in the collectives which have to be separated, may be ap- 
proached empirically. Under unfavorable conditions it may be either 
enforced with usual methods or even avoided. Statistical procedures 
are available for the choice of the traits which used to be made 
authoritatively. The significance of statements may be tested, also 
a comparison of discriminant functions, and the size of an interval 
of indifference. In diagnosing paternity by using a statistical balance 
it becomes possible for the first time to base the plausibility of a judge- 
ment on probability theory. Special observations may be dealt with 
by introducing a priori probabilities. 


310 H. DRUCKREY, Freiburg i.Br. Theoretical interpretation of the 
processes underlying pharmacological effects. 


The relationship between dosage and effect is developed by using 
‘dimensional equations’ in order to indicate the dimensions and mutual 
connections of variables on which pharmacological effects depend. 
At the same time an attempt is made to define basic concepts of pharma- 
cology more precisely, e.g. poison, dosage, effect. 

According to the ‘theory of hits’ the primary assumption is, that 
molecules of a poison act on particular ‘receptors’ of cells. The formu- 
lation of this phenomenon by using a dimensional equation corresponds 
in principle to the scheme for the kinetics of a bimolecular reaction. 
For the case of equilibrium an algebraic development yields results 
identical to formulae of the law of mass action, of isothermal adsorption, 
of diffusion, to empiric equations by A. J. Clark or A. Rosenblueth for 
the dosage-effect relationship, and finally to the ‘logit’ representation. 
The curves are hyperbolas. A linear function results if logarithms are 
taken of both members of the basic equation. A new probability grid for 
the dosage-effect relationship is based on this fact. At the same time 
it is explained that symmetric or linear functions are usually not found 
but by plotting versus the logarithm of dosage. 

A further numerical elaboration of dimensional equations gives 
significant information on the dimensions of variables on which the 
effect depends. Even the ‘individual variation’ may be referred to 
certain variables. The effect of a poison does not depend exclusively 
on the dosage, but on its ratio to the number of particular receptors 
in the effective volume and on the quotient of the two ‘time constants’ 
for the start of an effect and its reversibility. Prevailing is the constant 
of reversibility (v. the linking of carbondioxide or oxygen to hemo- 
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globin). The size of this constant determines the type of a poison. If it 
is small, the effect depends on concentration. If it is larger, a partial 
accumulation of effects exists. If the constant approaches infinity, 
i.e. if the effects are irreversible during the period of observation, the 
effects are added. This happens, for instance, for cancer inducing agents. 
The equations for irreversible summation are identical to well known 
formulae of the ‘theory of hits’. Completely separated phenomena 
may be reduced to the same basic equation. This agreement supports 
the hypothesis that all these processes are ruled by statistical laws. 
They must be reducible to quanta and therefore could be described by 
probability theory. But the equations and curves are trivial. No 
conclusions are possible about the underlying elementary processes. 

It is usual for pharmacological experiments that we do not observe 
the primary effects, but only consequences which may be the results 
of a long chain of consecutive reactions. Each step may be reversible 
or irreversible. For the occurrence of a summation of effects it is 
sufficient that a single step is irreversible. If two steps are irreversible, 
an ‘amplifying effect’ exists which in principle corresponds to the 
integral of concentration over time, multiplied by two. 

Finally it is considered how the dosage-effect relationship depends 
on the individual variation in mixed populations. It is emphasized 
that according to experimental experiences the difference between the 
sexes of a strain may be larger than that between two different strains. 


H. GAUL, Voldagsen, and H. MUENZNER, Goettingen. 
311 Determination of the number of homologous chromosomes in 
bastards of different species and subspecies. 


Problems on the homology of chromosomes in bastards of different 
species or subspecies are theoretically important with respect to the 
mutual relationship and the phylogeny of the parents which are used 
in the crossing. They are essential also in practical breeding of plants. 
Bastards of different species or subspecies show a variability of the 
numbers of chiasms and bivalent chromosomes in the cells of the pollen. 
Therefore it has not been possible yet to gain exact information on the 
number of homologous chromosomes by making cytological observa- 
tions. Empirically a parabolic dependence has been found between 
the number B of fixed chromosomes and the number X of chiasms. 
By using a combinatorial reasoning the same parabola is to be expected, 
assuming that the chiasms are distributed randomly in the set of paired 
chromosomal segments. The parameter of the parabola enables us to 
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estimate the number P of homologous chromosomes which are elegible 
for joining each other. Finally other models are tested with respect 
to their agreement with empirical findings. 


312 F. KEITER, Hamburg. Biometry and hereditary traits depend- 
ing on many genes. 


A heredity which depends on many genes (better: on many factors 
since genes are not the only participants) is revealed by the variation 
of the trait in the population. Continuous, unimodal, symmetric 
variation is to be expected for the case of many collaborating genes, 
whereas discontinuous, asymmetrical variation corresponds to a single 
active gene. More than one mode may occur if a main gene and ac- 
cessory genes participate, or if the influence of the environment is 
substantial. If the variation is plotted on the correct scale, traits 
depending on many genes prevail over those based on single genes, at 
least in normal anthropology. 

In special heredity studies (parents-offspring comparison) the 
average of the children is found between the average of the parents 
and the mean of the total population. The variance is wide, only about 
10% less than the variance of the population. The regression to the 
mean was only 15% for traits determined by impression, about 30% 
for measurements on adult offspring, about 45% for measurements on 
non-adult offspring. Mutually similar parents do not have more similar 
children than different parents. Evidently they are heterozygotes to 
the same degree. Children of certain combinations of parents have 
symmetrical, even normal distributions. The distribution stays 
symmetrical even for extreme values. 

The same phenomena which belong to a polyfactorial heredity may 
occur with a single active gene if types of families exist in the population, 
representing a different heredity of the same trait. This is well known 
for hereditary diseases. There seems to be no possibility of separating 
the two cases. 

Differences of the heredity of polyfactorial traits occur mainly 
because of a different regression to the average, less frequently because 
of a different variance of the children. The general scheme of the 
heredity of these traits, i.e. of almost all traits dealt with in normal 
anthropology, should be analogous to a high degree. This corresponds 
to the actual findings of critical values. For all possible combinations 
of child, mother, father the frequency for paternity is divided by the 
frequency without paternity. This ratio is the critical value. The 
critical values (proving values) are small for most single traits. Never- 
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theless they become very high for a combination of series of traits. For 
every polyfactorial trait there are combinations which exclude a 
paternity. Regions of variation exist which are impossible for children 
of certain combinations of parents. Usually the negative proofs are 
more convincing than the positive ones. A biometrically correct 
treatment of polyfactorial traits results in statements for diagnosing 
paternity which are as clear as those based on classic Mendelian 
heredity. Being an empirical hereditary prognosis, the method is free 
of hypotheses which are hard to verify. 


312 P. KNEIP, Cologne. Remarks on the evaluation of quantitative 
dosage-effect experiments. 


A series of tests is not completely evaluated if DL 50 and its variance 
have been determined. Further knowledge about drugs may be gained 
by plotting DL 50 against the duration of the experiment. This 
additional information does not depend on supplementary experimental 
animals. The method is simple and can be included in routine tests. 


314 S. KOLLER, Wiesbaden. Checking homogeneity if the regres- 
sions of several systems of correlation are analyzed. 


As an example of an analysis of covariances the regression lines in 
subsets of a large mass of data are compared with respect to their 
stability. These data belong to studies on the correlation between 
hemoglobin contents (in ccm blood) and surface area of erythrocytes 
(in ecm blood). In this example contradictions occur if one and the 
same relationship is assumed for men, women, and newborn infants. 
Checking the stability of a regression line in subsets of data corresponds 
under certain conditions to a test for the direction of the relationship. 
‘If actually X prevails over Y, the flat regression lines agree; if Y prevails 
over X, the steep regression lines are stable. It is assumed that no 
disturbing factors occur. 


315 W. LUDWIG, Heidelberg. Remarks on elementary problems 
which arise frequently in biometric routine work. 


Asan introduction to the following “Discussion of Queries” 
elementary statistical problems are chosen which according to practical 
experiences arise again and again. An attempt is made to indicate 
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convenient methods which yield an accuracy in general sufficient for 
biological and medical research. 

(1) Deletion of apparently extreme values of a small sample (normal 
distribution). (2) Guessing of a significant deviation from a normal 
distribution in small samples. (3) Separation of a non-normal dis- 
tribution into two normal components if there is a hypothesis that the 
population is a mixture of two normal collectives. (4) The coefficient 
of variation. (5) Comparison of two means for a weak relationship 
(normal distribution). (6) The Brandt-Snedecor formula for very small 
samples. (7) Comparison of an empirical and a theoretical frequency 
or of two empirical frequencies for assumed binomial distribution and 
very small samples. (8) 2 X 2 X 2-table and related topics. 


316 W. LUDWIG, Heidelberg. Stochastic reasoning in diagnosing 
paternity. 


A coefficient of plausibility Pl is defined that a defendant C; , named 
by the mother C; of the infant, is really the father of the suing child. 
The general concept of ‘combination of degrees of traits (C)’ is applied. 
Genetic and social-biological indications to paternity are separated. 
The result is a ‘generalized and corrected Essen-Moeller formula’. At 
the same time statements are possible under which restricting assump- 
tions the classic Essen-Moeller formula and other equations stay correct. 


317 E. WALTER, Goettingen. Components of covariance. 


The covariance may be split into components like the variance. The 
underlying model is described for the case of a simple classification. 
Sufficient methods for the computation of confidence intervals have not 
been developed yet, tests are lacking. Therefore the application of distri- 
bution-free procedures is discussed for a numerical example. This 
method may be used in animal husbandry for estimating the genetic 
correlation. 


318 H. BAITSCH, Munich. Biometry and problems of a correlation 
between traits. 


There are two main causes for a correlation of traits. One is the 
correlation following from a common causal (genetic) source. Then a 
complex of many traits is reduced to few arbitrary measurements. The 
other main cause for a correlation of traits is an inhomogeneity, an 
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incomplete mixture in the observed total population. Errors result 
from these correlations. In order to avoid them in a usual balancing 
procedure, a limitation to uncorrelated traits, if possible of a highly 
convincing kind, is recommended. Otherwise the various partial corre- 
lations have to be computed. The efficiency of the tested traits has to 
be reduced according to these partial correlations. Consequences of an 
incomplete mixture cannot be annulled by using such methods. Another 
solution can be found by applying a discriminant function instead of a 
balance. Consequences of a correlation of traits are—at least partially— 
eliminated automatically. Problems resulting from an incomplete 
mixture may be attacked more easily with these procedures. 


Biometric Society (British Region) The twentieth meeting of the Region 
was held at the Wellcome Research Institute, 183 Euston Road, 
London, N.W. 1., at 2:30 p.m. on Wednesday, 14th April 
55. The following papers were read and discussed: 


319 P. ARMITAGE, A. W. DOWNIE and K. McCARTHY. 
Variations in counts of smallpox virus lesions. 


When a suspension of smallpox virus is inoculated into a number of 
eggs, the variation in the count (i.e. number of lesions per egg) may be 
much higher than would be expected from a Poisson distribution. 92 
groups of replicate counts were examined, and by working with log 
count and also using a logarithmic transformation of the variance of 
log count, a simple empirical relationship was established between the 
variance of the count and its mean (c° = 13.6 uw). There were no 
significant differences between groups in this relationship. It is proposed 
that, in comparing the means of two small groups of counts, the standard 
error of the difference could be estimated from this empirical formula, 
so as to provide a more powerful test than the ¢-test. 


D. R. COX (Statistical Laboratory, University of Cambridge). 
320 The design of an experiment in which some treatment arrange- 
ments are inadmissible. 


Consider an experiment in which the experimental units are arranged 
in sets of k units, a set corresponding, for example, to a single production 
run of an industrial process. Suppose also that the k units in each set 
are arranged in order corresponding to the first, second, etc. period of 
the set, and that for practical reasons there is a restriction on the order 


| 
3 


ABSTRACTS 249 


of treatments within each set, such as that the level of the treatment 
must not decrease from one period to the next in a set. This paper is 
concerned with designs for such a situation; the method of construction 
is described and designs are given for a few special cases. Dr. C. J. 
Anson, G. K. N. Group Research Laboratory, suggested the problem; 
it arose in connection with an experiment on the properties of alloys 
made from high purity metals. 


321 F. YATE®S: The combination of data from a set of 2 X 2 tables. 


If a pair of treatments is such that their effects can only be measured 
by quantal (‘‘all-or-nothing’’) responses the results of an experimental 
comparison of the two treatments can be arranged in the form of a 
2 X 2 contingency table. When several such experiments are carried 
out direct pooling of the results can be misleading if there is hetero- 
geneity between different experiments. In order to avoid pooling such 
data have often been analysed by calculating the significance level of 
each experiment separately and forming a combined significance test. 
This method, however, is inefficient, and also fails to provide a quantita- 
tive estimate of the difference between the treatments. A more 
satisfactory approach is to obtain a direct estimate of the difference 
(together with its standard error). If the numbers of observations in 
the separate experiments are small a maximum likelihood solution 
based on one of the well-known transformations (log log, logit or probit) 
should be used. The appropriate method of analysis will be described 
and illustrated by application to a genetical example. The method can 
easily be extended to sets of experiments involving more than two 
treatments. 


The Biometric Society—British Region Wednesday, January 16, 1955 


322 M. R. SAMPFORD. The Use of Litter-Mates in Response- 
Time Experiments. 


(The use of litter-mates in comparative trials in which time to 
response is the observed variate is of considerable value in reducing 
error, but leads to complications in the analysis when some animals 
fail to respond before observation is suspended, or do not show the 
response at all. These two situations are discussed, and appropriate 
methods of analysis are outlined). 
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Abstracts of Papers presented before British Region on March 3, 1955 


R. E. BLACKITH. The Analysis of Social Facilitation at the 


323 Nest Entrances of Some Hymenoptera. 


The passage of unmarked social hymenoptera in and out of their 
nests is decisively non-random, grouping being demonstrated with both 
wasps and bumble bees. The observed distributions follow the negative 
binomial, one plausible interpretation of which assumes that workers 
are inhibited from passing through the nest entrance until sufficient 
individuals have accumulated to act as a releaser. Most workers of the 
red wasp Vespula rufa are released by the accumulation of from one to 
three further workers at the entrance. Other species of wasp seem to 
have less marked inhibitions. Young queen bumble-bees (Bombus 
lapidarius) have significantly higher inhibitions than have workers of 
this species. Worker wasps may obtain their releaser from individuals 
passing in the opposite direction only when insufficient pass in the 
same direction. 

Different types of test reveal non-random passage of the nest entrance 
whei. many or when fewer insects are active. Grouping may be measured 
by the entropy of social organization. Some methods of estimating the 
number of workers foraging and of the mean duration of a flight, depend 
on a complete return of workers to the nest at night. A dawn to dusk 
record shows that this return may be far from complete, leading to 
biassed estimates. 


CEDRIC A. B. SMITH. An Estimation procedure for propor- 
tions, with genetical applications. 


324 


Many parameters of genetical interest are the frequencies of par- 
ticular types of events or objects: for example, gene frequencies, 
frequencies of recombination, “penetrance” or manifestation frequency, 
and so on. If we have a series of trials in each of which it is known 
whether the event in question has or has not occurred, or object been 
present, then the frequency is estimated as the proportion of such 
events in the whole sample, and the usual binomial formula gives the 
standard error. This applies, for example, to the estimation of the MN 
blood group gene frequencies by simple counting of genes in a sample 
of unrelated individuals. Complications are introduced by effects like 
dominance, which makes it uncertain exactly which genes are present 
(a group B individual can be genetically BB or OB), and by family 
dats. in which the same gene may recur among different members of the 
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same family. A counting method can still be used for estimation. Thus, 
in considering blood groups, we take provisional values of the gene 
frequencies, estimate from these how many B individuals are in fact 
BB, and how many OB, count genes, and so obtain improved estimates. 
An iteration leads to the final estimates, which (correctly calculated) 
can be shown to be Maximum likelihood estimates. However the 
process is purely numerical, avoiding the use of calculus. The variance 
of the estimates follows from suitably modified binomial or multinomial 
formulas, and the usual maximum likelihood theory can be applied 
to give heterogeneity tests, etc. The method is applicable whenever 
the probability of the observed sample is a rational function of the 
unknown parameters. 


ERRATA—W. T. Federer and C. 8. Schlottfeldt, 
The Use of Covariance to Control Gradients in Experiments, June, 1954. 


Gratitude is expressed to Prof. Gertrude M. Cox for pointing out 
some computational errors on Page 288 and 289, Volume 10, of the 
article entitled “The Use of Covariance to Control Gradients in Ex- 
periments.” 6,,,. = 0.198933 should read b,;,. = 19.893256. The 
corrected values for columns 5, 7 and 8 in Table VII are: 


Adjustments for Total Adjusted 
by1.2 —0) by2.1 (Xs, —32) Total Mean 

—39.787 8755.975 1094.50 

—59.680 8700. 407 1087.55 
99.466 8817 .930 1102.24 
59.680 7723 .106 965.39 
59.680 

— 59.680 

—59.680 
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General election. As general officers of the Society for 1955, The 
Council has re-elected Professor W. G. Cochran of Johns Hopkins 
University, President and C. I. Bliss, Secretary-Treasurer. In a total 
count of 454 individual mail ballots, the following were elected to the 
Council for 1955-57: G. M. Cox, B. B. Day, J. H. Gaddum, M. P. 
Geppert, M. Masuyama, P. A. Moran and J. Neyman. The Society is 
indebted for their services to the retiring Council members for 1952-54; 
C. W. Emmens, J. O. Irwin, Arthur Linder, A. M. Mood, C. R. Rao 
and Georges Teissier. 

Biometric Symposium in Brazil. Plans are nearing completion for 
the International Biometric Symposium to be held in Campinas, near 
Sao Paulo, Brazil, on July 4-8, of this year. A preliminary announce- 
ment, dated April 19, has been sent to a special mailing list of nearly 
300 in Latin America. A varied program, still provisional, has been 
arranged for the five days of the Symposium. The opening session will 
feature an address by W. G. Cochran, President of the Society. The 
Symposium will continue in the afternoon with two papers on Bio- 
metrical Genetics, by E. R. Dempster and by Sir Ronald Fisher. 
Experimental Designs for Perennial Crops and for Animal Experiments 
will be discussed on the following day by 8. C. Pearce, C. Fraga and A. 
Conagin, G. M. Cox, F. Pimentel, P. G. Homeyer, W. J. Youden, and 
Arthur Linder. That evening Professor Th. Dobzhansky will lecture 
in Portuguese on “‘Genetica and Heterose’’. A session the following 
day on Medical Statistics will present papers by J. O. Irwin, J. Manceau, 
A. E. Brandt, and A. Vessereau. The rest of the day has been left free 
for excursions. On Thursday, different aspects of Sampling Techniques 
will be considered in the morning by M. H. Hanson, P. V. Sukhatme 
and V. G. Panse, E. Cansado, and J. Nieto de Pascual. A panel dis- 
cussion on Experimental Designs is scheduled for that afternoon. The 
Friday sessions concern Bioassay, with papers in the morning by C. I. 
Bliss and by D. J. Finney, followed in the afternoon by a panel dis- 
cussion on Statistical Problems in Bioassay submitted by those attend- 
ing. Anyone interested in receiving announcements of the Symposium 
is invited to write to the Secretary of the Biometric Society, Box 1106, 
New Haven 4, Connecticut. 

IUBS. At the 12th General Assembly of the International Union 
of Biological Sciences on April 12-16 in Rome, the Biometric Section 
of the IUBS, which is provided by the Society, was represented by 
L. L. Cavilli-Sforza of Milan and A. Vessereau of Paris. An additional 
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report has been received from A. Linder of Geneva, past President of 
the Society, who attended as Treasurer of the IUBS. During the 
Assembly, the IUBS was reorganized into three main divisions of Plant 
Biology, Animal Biology and General Biology, each with three to five 
sections. The Division of General Biology now comprises the Sections 
of Biometry, Cell Biology, Genetics, Microbiology and Limnology. 
Professor Linder resigned as Treasurer and was replaced by Dr. Lanjouw, 
a botanist from Utrecht, Holland. The President of the IUBS, Dr. 
Ho6rstadius of Sweden, commended the Biometric Section (Society) on 
different occasions as a model which could well be followed by others, 
in particular because of its international, regional and national organiza- 
tion. The Assembly approved support for an International Symposium 
to be held during our Fourth International Conference in Canada in 
1958 on a biometric genetic topic, and will be able to give some financial 
assistance to the Secretary’s office. Continuing support for a European 
Biometric Seminar or Colloquium, which it is proposed to hold annually 
in different parts of Europe, will depend largely upon the state of the 
budget of the IUBS. Although some controversy developed over 
IUBS support for this proposal, it was warmly endorsed by President 
Ho6rstadius and by other officers of the IUBS. Future subsidies have 
yet to be determined by the Executive Committee of the Union. 
Biometric Colloquium in Italy. The European Seminar or Colloquium 
in Biometry, noted in BIOMETRICS for March, will be held at Varenna, 
Italy, on September 7-23, 1955. The following report is based upon the 
recent announcements issued by the Italian Region. The Seminar is 
open to graduates in medicine and surgery, in veterinary medicine, in 
the biological and other natural sciences, in agriculture, and in pharmacy, 
who wish to improve their knowledge of biometry for purposes either 
of teaching or of research. Three basic courses will be offered in Italian 
on (A) Fundamental Theory by M. P. Geppert of the W. G. Kerckhoff- 
Herzforschungs Institut, Bad Nauheim, Germany, (B) Design of 
Experiments by F. Anscombe of the Statistical Laboratory, University 
of Cambridge, England, and (C) Analysis of Variance and Covariance 
by C. A. B. Smith, Galton Laboratory, University College, London. 
Practical exercises in application will form part of the last two courses. 
Additional lectures have been arranged, both general and on specialized 
topics, including bioassay, animal husbandry, agricultural experiments, 
medicine and hygiene, and statistical genetics. Among the visiting 
professors on this part of the program are G. Barbensi, F. Brambilla, 
Sir Ronald Fisher, G. Pompilj and A. Tizzano. Problems submitted 
by participants in the Colloquium will be discussed in general seminars. 


An attendance of about 25 is anticipated. Applications for admission. 
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to the Seminar and requests for further information should be sent at 
once to Professor L. L. Cavalli-Sforza, Istituto Sieraterapico Milanese 
8. Belfanti, Via Darwin 20, Milano, Italy, giving full information about 
the preparation of the applicant. 

Syllabus on Biometry. During its Assembly in Rome, the IUBS 
sponsored a Symposium on “Problems of International Concern in the 
Life Sciences”. A session on Education was chaired by Dr. Paul Weiss 
of the Rockefeller Institute for Medical Research in New York. At 
Dr. Weiss’ request, a six-page mimeographed report on “Biometric 
Needs and Opportunities in Biological Education” was prepared in the 
Secretary’s office. Based upon a statement by President Cochran, it 
was revised and expanded with the aid of 20 replies from members of 
the Society in the United States, Great Britain and Europe. The 
report reviews briefly the content and approach in a non-mathematical 
introductory course on the statistical aspect of biometry, the additional 
topics which might be considered in further or more specialized training, 
the place of laboratory work and conferences, the role of a statistician 
in biological research, and the place of refresher courses and of work- 
shops or colloquia for the professional biologist. Members of the 
Society can obtain copies of this report on request from the Secretary’s 
office. 

German Region. The members of the Biometric Society in Germany 
held their third meeting and second Biometric Colloquy at the Kerckhoff- 
Institute in Bad Nauheim on January 28-30, 1955, with more than 120 
persons in attendance, among them 40 members of the Society. The 
opening session on “ Analysis of covariance”’ offered introductory reports 
by H. Miinzner, 8. Koller, C. Harte and E. Walter; the afternoon 
session on “ Dose-response-curve” reports by R. Prigge, H. Druckrey, 
K. Soehring and K. Sommermeyer from the point of view of immunology, 
pharmacology and radiology. This topic was continued the second day 
with original papers by P. Kneip, A. Beckel and L. Schmetterer. The 
third day’s program on “Biometric methods of paternity-diagnosis” 
consisted of papers by H. Gaul and H. Miinzner, F. Keiter, H. Baitsch, 
W. Ludwig, W. Bauermeister and R. K. Bauer. On the second day a 
business meeting was followed by a discussion on ‘Unification of 
biometrical terminology (terms and symbols)”. In a session on 
“Questions from practical biometric work’, opened by a paper of W. 
Ludwig, 8 queries presented by the participants in the Colloquy were 
discussed at length. 

During the Colloquium, the business meeting on January 29 dis- 
cussed at length the organization of the German Region of the Society, 
voted to form the Region, adopted statutes, and fixed the Regional 
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dues. A later mail ballot named the following as Regional officers: 
President, E. Ullrich; Secretary-Treasurer, W. Ludwig; Regional 
Committee, K. Freudenberg, M. P. Geppert, O. Heinisch and H. 
Miinzner. 

On March 3-11, Professor R. C. Bose of the University of North 
Carolina gave a series of eight lectures on “Incomplete Block Designs” 
at the University of Frankfurt, to which all German members of the 
Society and other biometricians were invited on behalf of the Society 
and University. Dr. Bose’s lectures were enthusiastically received and 
contributed materially to the development of biometry in Germany. 

British Region. At the meeting of the British Region on April 14, 
1954, at the Wellcome Research Institute in London the following 
papers were presented and discussed: ‘‘ Variations in counts of smallpox 
virus lesions” by P. Armitage, A. W. Downie and K. McCarthy, ‘‘The 
design of an experiment in which some treatment arrangements are 
inadmissible” by D. R. Cox, and ‘‘The combination of data from a set 
of 2 X 2 tables” by F. Yates. On June 18, 1954, the Region met for 
dinner at the Lister Institute, which was followed by demonstrations of 
some of the work in progress. 

The annual meeting of the British Region on January 26, 1955, 
elected the following Regional officers and committee: President, R. R. 
Race; Treasurer, A. R. G. Owen; Secretary, E. C. Fieller; Committee, 
D. J. Finney, M. J. R. Healy, J. A. Fraser Roberts, J. G. Skellam, 
J. M. Tanner, K. D. Tocher, J. W. Trevan, G. E. P. Box, and ex- 
officio Sir Ronald Fisher, J. H. Gaddum and F. Yates. Following the 
annual meeting, three papers were read and discussed: “An unusual 
frequency distribution” by Sir Ronald Fisher, ‘Estimation of bacteria 
in whale meat by dilution methods” by H. W. Daniels, and “‘The use 
of litter-mates in response-time experiments” by M. R. Sampford. 
The Region met again on March 3 at the Wellcome Research Institute 
in London, with the following program: “Analysis of social facilitation 
at the nest entrances of some Hymenoptera” by R. E. Blackith, “An 
estimation procedure for proportions, with genetical applications” by 
C. A. B. Smith, and “Trials of skinfold calipers” by M. J. R. Healy 
and J. M. Tanner. Abstracts are being published in BIOMETRICS. . 

ENAR. The Eastern North American Region met jointly with 
the Institute of Mathematical Statistics on April 22—23 at the University 
of North Carolina in Chapel Hill. At the opening session, invited 
papers by F. 8. McFeely, J. E. Freund, T. Horner, I. Miller, H. Bozivich, 
and R. L. Wine considered various aspects of life testing, components 
of variance and decision procedures. At the following session D. G. 
Austin, J. Blackman and C. Derman spoke on Probability Theory. 
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The afternoon program opened with a session on Multivariate Analysis, 
with papers by T. W. Anderson, W. G. Howe and H. C. Sweeny, which 
was followed by nine contributed papers on mathematical statistics. 
A discussion of the relation between smoking and mortality from lung 
cancer, opened the meeting on April 23 with J. Cornfield and W. Haens- 
zel as principal speakers, and B. Harshbarger and D. Horn as discussants. 
The morning program concluded with a session of five papers on mathe- 
matical statistics. The afternoon program included contributed papers 
on problems in discriminant analysis by S. W. Greenhouse, in bioassay 
by J. Ipsen, in plot selection by D. G. Horvitz and J. Fleischer, in 
characteristic functions by M. C. K. Tweedie, and by title, on con- 
tingency tables with missing or mixed-up cell entries by G. S. Watson. 

Abstracts of the Society sessions are printed in this issue of 
Biometrics; those of the joint sessions and others will appear in the 
Annals of Mathematical Statistics. 


Région Frangaise. Lors de la derniére réunion de la Société Francaise 
de Biométrie, qui eut lieu mercredi le 9 février a |’Ecole Normale 
Supérieure 4 Paris, Monsieur S. Lédermann fit une conférence sur 
“le Cancer, l’Alcool, et le Tabac’’ et Messieurs J. Sutter et L. Tabah 
discutérent les “Recherches sur la Mortalité par vieillissement”. Au 
cours de cette réunion eut lieu |’élection pour le renouvellement du 
Couseil et du Bureau. Monsieur David Schwartz fut élu secrétaire- 
trésorier et Monsieur J. M. Faverge fut élu membre du conseil. 
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